Confidence Intervals Involving a Proportion

The proportion of a population with a certain characteristic is denoted by the parameter $p$.

Finding the actual value of any population parameter -- $p$ included -- is often extremely difficult and/or expensive, in that we would need knowledge of every element in that population.

Instead, we frequently settle for a good estimate of the population parameter in question, found through a related sample statistic.

If we conduct a binomial experiment where "success" indicates that the related characteristic is present, and $x$ successes are seen in $n$ trials, then the sample statistic, $\widehat{p} = x/n$ turns out to be a good estimator of $p$.

When we say "good estimator", we mean specifically that the estimator is:

When a single value is given as an estimate for a population parameter, we say that value is a point estimate of the parameter in question.

As it turns out, the best point estimate for $p$ is $\widehat{p}$.

There is a very close relationship between the binomial distribution and the distribution of $\widehat{p}$ sample statistics.

Recall that in a binomial distribution, if $n$ is the number of trials, $p$ is the probability of "success" in any one trial, and $x$ is a number of successes seen, then when $np \ge 5$ and $nq \ge 5$, the binomial distribution is approximately normal with $$\mu = np \quad \textrm{ and } \quad \sigma = \sqrt{npq}$$

But then, under the same assumptions, $\widehat{p} = x/n$ must also result in an approximately normal distribution with $$\mu = p \quad \textrm{ and } \quad \sigma = \sqrt{\frac{pq}{n}}$$ (i.e., the result of simply dividing both the previous mean and standard deviation by $n$.)

Of course, with a point estimate like $\widehat{p}$, one wonders how close does it get to $p$? To slowly work our way towards answering this question, consider the following:

When talking about proportions, recall that by the Empirical Rule, 95% of the data are within roughly $2\sigma$ of $\mu = p$.

Thus, for 95% of samples, the interval $[\widehat{p} - 2\sigma, \widehat{p} + 2\sigma]$ contains $p$.

This suggests an alternative to a point estimate for proportions -- an interval in which we might expect to find the population parameter, $p$, with a certain probability. For the interval described above, that probability would be 95%.

However, notice that we can't actually compute the interval as described since $\sigma = \sqrt{\frac{pq}{n}}$ depends on knowledge of $p$ -- the very value we are trying to approximate!

All is not lost, however. Recall that we can approximate $\sigma$ by approximating $p$ and $q$ with $\widehat{p}$ and $\widehat{q} = 1 - \widehat{p}$, respectively. Whenever we estimate a standard deviation of a sampling distribution like this, we call the approximation the standard error. Further, realizing that the factor $2$ on $\sigma$ above is just an easy-to-remember approximation to $z = 1.96\ldots$, the $z$-score with $\frac{0.05}{2} = 0.025$ area in the right tail, we slightly modify the interval estimate for $p$ to be instead: $$\left( \widehat{p} - 1.96\sqrt{\frac{\widehat{p}\widehat{q}}{n}}\,,\, \widehat{p} + 1.96\sqrt{\frac{\widehat{p}\widehat{q}}{n}} \right)$$

An interval like the one above, is called a confidence interval. The slight changes we had to make in order to keep the interval computable (notably, swapping out $p$ and $q$ with $\widehat{p}$ and $\widehat{q}$, respectively), mean that the probability of capturing the true population proportion $p$ with similarly constructed confidence intervals is no longer exactly 95%, although it is probably very close to that value.

So as to not get "stuck in the weeds" with an overly complicated description of the true probability involved, we gloss over this slight difference by saying that the interval has a confidence level of 95%.

Getting back to the question of how close $\widehat{p}$ is likely to be to $p$ (at a certain confidence level), note that if we expect $p$ to fall somewhere in an interval of the form $[\widehat{p} - E, \widehat{p} + E]$, then $\widehat{p}$ will be no more than $E$ away from $p$. Consequently, we say that $E$ is the margin of error for the confidence interval.

For the 95% confidence interval for a proportion given above, $$E = 1.96\sqrt{\frac{\widehat{p}\widehat{q}}{n}}$$

Remember, that all of this is predicated on the binomial distribution in question being approximately normal. This is true when $np \ge 5$ and $nq \ge 5$. Again however, we don't know the actual value of $p$ necessary for this calculation. Consequently, we can check to see if $n\widehat{p} \ge 5$ and $n\widehat{q} \ge 5$. This should give us a reasonably good indication as to whether or not the underlying binomial distribution is approximately normal. Thus,

If $n\widehat{p} \ge 5$ and $n\widehat{q} \ge 5$, then the 95% Confidence Interval for $p$ is given by $$\left( \widehat{p} - 1.96\sqrt{\frac{\widehat{p}\widehat{q}}{n}}\,,\, \widehat{p} + 1.96\sqrt{\frac{\widehat{p}\widehat{q}}{n}} \right)$$

We can build confidence intervals for different confidence levels in a similar way.

Suppose one wishes to build a confidence interval for a proportion with a confidence level of $(1 - \alpha)$.

Letting $z_{\alpha/2}$ denote the $z$-score with $\frac{\alpha}{2}$ in the right tail, then the confidence interval we seek is given by $$\left( \widehat{p} - z_{\alpha/2}\sqrt{\frac{\widehat{p}\widehat{q}}{n}}\,,\, \widehat{p} + z_{\alpha/2}\sqrt{\frac{\widehat{p}\widehat{q}}{n}} \right)$$

As an example, to build a 99% confidence interval, one would use $\alpha = 0.01$ and $z_{\alpha/2} \doteq 2.58$

How Large a Sample Should We Take?

When designing an experiment, one has to confront the question "How large a sample is needed to obtain a certain confidence level and margin of error?"

However, knowing that the margin of error is given by $$E = z_{\alpha/2} \sqrt{\frac{\widehat{p}\widehat{q}}{n}}$$ we can solve for $n$ to find $$n = \widehat{p}\widehat{q}\left( \frac{z_{\alpha/2}}{E} \right)^2$$ Of course, you need to decide how large of a sample you should use BEFORE the experiment is conducted, which means that you probably don't know what $\widehat{p}$ and $\widehat{q}$ are yet! (An exception to this situation would be if some previous study had been done that previously established a $\widehat{p}$ and $\widehat{q}$ that we could use.)

Fortunately, $\widehat{p}\widehat{q} = \widehat{p}(1 - \widehat{p})$, and $y = x(1-x)$ is a parabola, opening down, with a maximum value occurring where $x = 1/2$. Thus, $\widehat{p}\widehat{q}$ will never be any larger than $1/2(1 - 1/2) = 1/4$

Thus, to ensure a certain confidence level and margin of error, one should use a sample size of $$n = \frac{1}{4}\left( \frac{z_{\alpha/2}}{E} \right)^2$$ In both cases (i.e., when $\widehat{p}$ is known or unknown), when one finds $n$ via the aforementioned formulae, one should always be conservative in their calculation and round up, to ensure $n$ is large enough.