4.0 \(95\%\) confidence interval of the average number of hours worked
We can compute a \(95\%\) condifence interval for the true proportion of Americans who work full-time, \(p\), by using the Central Limit Theorem (CLT). The CLT says that the sampling distribution of a statistic, in this case a proportion, is approximately normal, with the true population proportion, \(p\), as its mean, and the standard error of the sample, \(SE=\sqrt{\frac{p\cdot (1-p)}{n}}\), as its standard deviation, where \(n\) the size of each sample.
\[
\hat{p}\sim\ N(mean = p, sd=\sqrt{\frac{p\cdot (1-p)}{n}})
\]
If we were able to draw many samples of equal size of the proportion of Americans who work full-time, and computed the mean of each sample, the CLT says the distribution of that proportion is approximately normal. Since we typically don’t know the true proportion \(p\), we use the point estimate \(\hat{p}\) as a proxy for the purpose of computing the standard error \(SE\) and the \(95\%\) confidence interval.
In reality, we can only draw one sample from the population. We typically don’t know the true proportion \(p\) of the population. We also don’t know where the sample proportion we have drawn, \(\hat{p}\), falls in the sampling distribution, but from the CLT, we do know that the proportions of \(95\%\) of the samples drawn will fall within \(1.96\cdot \sqrt{\frac{p\cdot (1-p)}{n}}=1.96\cdot SE\) of \(p\). For \(95\%\) of the samples we draw, an interval within \(1.96\cdot \sqrt{\frac{\hat{p}\cdot (1-\hat{p})}{n}}=1.96\cdot SE_{\hat{p}}\) of \(\hat{p}\) will include the true proportion of the population. For any sample whose proportion estimate \(\hat{p}\) falls within \(1.96\cdot SE\) of \(p\), which will happen \(95\%\) of the time, we are \(95\%\) confident that an interval centered around \(\hat{p}\) and within \(1.96\cdot SE_{\hat{p}}\) of \(\hat{p}\) will contain the true proportion of the population.
4.1 An example
It is much easier to understand with an actual example and a plot. Suppose we have a population with a true proportion \(p=0.5\), and we draw a sample of size \(n=100\). Per the CLT, the distribution of sample proportions taken from that population is approximately normal: \(\hat{p}\sim\ N(mean = 0.5, sd=\sqrt{\frac{0.5\cdot (1-0.5)}{100}}=0.05)\). Any sample drawn from the population whose estimate \(\hat{p}\) falls within \((0.5-1.96\cdot0.05,\ 0.5+1.96\cdot0.05)=(0.402,\ 0.598)\) will have a \(95\%\) confidence interval that contains the true proportion, \(p=0.5\). If we draw a sample from the population, and the sample proportion \(\hat{p}=0.58\), the \(95\%\) confidence interval centered around \(\hat{p}=0.58\) will contain the true mean \(p=0.5\). Since the person taking the sample typically doesn’t know \(p\), she will use her sample’s \(\hat{p}\) to compute \(SE_{\hat{p}}\), for the purposes of computing the \(95\%\) confidence interval. \(SE_{\hat{p}}\) will be: \(SE_{\hat{p}}=\sqrt{\frac{0.58\cdot (1-0.58)}{100}}=0.0494\), and the \(95\%\) confidence interval will be: \((0.58-1.96\cdot0.0494,\ 0.58+1.96\cdot0.0494)=(0.4832,\ 0.6768)\), which contains the true proportion \(p=0.5\).

If we are unlucky and draw a sample whose proportion \(\hat{p}\) falls in the shaded area, which should only happen \(5\%\) of the time, its \(95\%\) confidence interval will not include the true proportion \(p=0.5\).

4.2 Conditions for the confidence interval
The conditions for the validity of the confidence interval are:
Sampled observations must be independent.
We expect at least 10 successes and 10 failures in the sample, i.e., \(n\cdot\hat{p}\geq10\) and \(n\cdot(1-\hat{p})\geq10\).
The first criteria for this random sample can be verified by checking that the observations come from a simple random sample and represent less than \(10\%\) of the population. The population consists of Americans who work part-time or full-time, and the sample size can be computed by R as
n <- length(gss2016_wrkstat)
cat("Sample size n =", n)
Sample size n = 2867
and it is certainly less than \(10\%\) of the population.
For the second criteria, since we don’t know \(p\), we will use our point estimate \(\hat{p}\), which can be computed using R:
p_hat <- table(gss2016$wrkstat)["working fulltime"] / sum(table(gss2016_wrkstat))
p_hat <- as.numeric(p_hat)
cat("Estimate of proportion working full-time =", p_hat)
Estimate of proportion working full-time = 0.461243
and so the number of successes is
number_of_successes <- floor(n * p_hat)
cat("Number of successes:", number_of_successes)
Number of successes: 1322
and the number of failures
number_of_failures <- floor(n * (1 - p_hat))
cat("Number of failures:", number_of_failures)
Number of failures: 1544
both of which are much greater than \(10\).
4.3 Critical value \(z^*\)
The \(z^*\) corresponding to a \(95\%\) confidence interval in the standard normal distribution is approximately 1.96. We can compute it more exactly using R:
z_star <- qnorm(p = 0.025, mean = 0, sd = 1, lower.tail = FALSE)
cat("z-value corresponding to 95% confidence interval:", z_star)
z-value corresponding to 95% confidence interval: 1.959964
4.4 Standard error of the sample
The standard error of the sample is
se_p_hat <- sqrt(p_hat * (1 - p_hat) / n)
cat("Standard error SE =", se_p_hat)
Standard error SE = 0.009309953
4.5 Confidence interval
Computing the confidence interval bounds
conf_int_lb <- p_hat - z_star * se_p_hat
conf_int_ub <- p_hat + z_star * se_p_hat
cat("Confidence interval lower bound:", conf_int_lb, "\nConfidence interval upper bound:", conf_int_ub)
Confidence interval lower bound: 0.4429958
Confidence interval upper bound: 0.4794902
Hence, our confidence interval is \[
0.4612\pm 1.96\cdot 0.0093=(0.4430, 0.4795)
\]
We are \(95\%\) confident that the true proportion of Americans employed full-time is between \(0.4430\) and \(0.4795\).
5.0 Hypothesis testing
We can use the CLT and the data collected to construct a hypothesis testing framework. The hypothesis test considers two possible interpretations of our data, a null hypothesis \(H_0\), and an alternative hypothesis \(H_a\). \(H_0\) basically says that the sampled data could have been drawn simply by chance, and so, it is misleading. There is “nothing going on”. \(H_a\) takes the view that the data collected reveals that “something is going on”. We will either reject the null hypothesis in favor of this alternative, or we will fail to reject it and conclude the sampled data could have been drawn simply by chance. Note that even if we fail to reject \(H_0\), that does not mean we accept it as the ground truth, it’s just that the data we have collected does not allows us to discard \(H_0\).
For example, can try to answer whether the proportion of Americans who work full-time is greater than 0.45. The framework for the hypothesis test would be as follows:
\[
H_{0}:The\ true\ proportion\ of\ Americans\ who\ work\ full-time\ is\ p_0=0.45
\\
H_{a}:The\ true\ proportion\ of\ Americans\ who\ work\ full-time\ is\ greater\ than\ p_0=0.45
\]
To perform the test, we assume that \(H_0\) is true and ask, given that \(H_0\) is true, how probable it is to observe data as extreme or more as the one we have.
5.1 The null hypothesis proportion \(p_0\)
Since in the hypothesis test we assume that \(H_0\) is the truth and the true proportion of Americans who work full-time is \(p_0 = 0.45\), we will use \(p_0\) to compute \(SE_{p_0}\), the standard error under the null hypothesis.
p_null <- 0.45
se_p_null <- sqrt(p_null * (1 - p_null) / n)
cat("Standard error under the null hypothesis:", se_p_null)
Standard error under the null hypothesis: 0.009291242
5.2 Conditions for hypothesis testing
The conditions to perform the hypothesis test are similar to the ones we checked to compute the confidence interval.
Sampled observations must be independent.
Each sample should have at least 10 successses and 10 failures. We use the null hypothesis proportion \(p_0\) to compute the numbers of successes and failures.
\[
n\cdot \hat{p_0}\geq 10\\ n\cdot (1 - \hat{p_0})\geq 10
\]
Verifying the success-failure conditions for hypothesis testing
number_of_successes <- floor(n * p_null)
number_of_failures <- floor(n * (1 - p_null))
cat("Number of successes:", number_of_successes, "\nNumber of failures:", number_of_failures)
Number of successes: 1290
Number of failures: 1576
The success-failure conditions are satisfied.
5.3 The p-value
The p-value quantifies the strength of the evidence against the null hypothesis. We compute it by asking ourselves, given that the null hypothesis \(H_0\) is true, what is the probability of observing data as extreme or more as the one we have.
\[
P(observing\ data\ as\ extreme\ or\ more\ |\ H_{0}\ is\ true)
\]
That probability is the p-value. Typically, we use a \(5\%\) significance level as the threshold to reject the null. If the p-value is less than \(5\%\), we reject the null in favor of the alternative.
For our hypothesis framework, under \(H_0\) and the CLT, \(\hat{p}\) is approximately normally distributed, with \(p = p_0 = 0.45\) and \(SE_{p_{0}}=0.0093\). What is the probability of drawing a sample with a proportion \(\hat{p}=0.46124\) or higher, given that the null hypothesis is true?
\[
P(drawing\ a\ sample\ where\ the\ proportion\ of\ Americans\\ employed\ full-time\ is\ 0.4612\ or\ higher\ |\ H_{0}\ is\ true)
\\
P(\hat{p}\ \geq\ 0.46124\ |\ p = 0.45)
\]
We can do it graphically:
#http://www.statmethods.net/advgraphs/probability.html
# x = p_null +/- 4 std_dev's
x <- seq(-4,4,length=1000)*se_p_null + p_null
hx <- dnorm(x, p_null ,se_p_null)
lb <- p_hat; ub <- max(x)
plot(x, hx, type="n", xlab="Proportion of Americans employed full-time", ylab="", main="Sampling distribution under null hypothesis", axes=FALSE)
i <- x >= lb & x <= ub # indexes of x where x >= than lb and <= than ub
lines(x, hx) # plots normal distribution
polygon(c(lb,x[i],ub), c(0,hx[i],0), col="red") # shades area where x >= lb in red
axis(1, at=seq(0.41, 49, 0.005), pos=0) # draws axis
abline(v=p_null)
grid()

That probability is the area under the sampling distribution shaded in red in the plot. It can be computed using pnorm()
.
area <- pnorm(q = p_hat, mean = p_null, sd = se_p_null, lower.tail = FALSE)
cat("Our p-value:", area)
Our p-value: 0.1131268
So our p-value, the probability of drawing a sample with \(\hat{p}=0.461243\) or higher under the null hypothesis, is about \(0.113\). That probability is high. At the \(5\%\) significance level, we can’t reject the null hypothesis: a sample proportion of \(\hat{p}=0.461243\) or higher could happen simply by chance if the true proportion is \(0.45\).
