Tech Tips: Two-Sample Proportion Test

To conduct a difference of proportions test,
• R: Check assumptions, then use the function

prop.test(x, n, alternative, conf.level)


To explain the parameters:

• x is a vector of the number of successes seen in the two categories
• n is a vector of the two sample sizes
• alternative is a string of text that specifies the alternative hypothesis (i.e., "two.sided", "less", or "greater", for $p_1 \neq p_2, p_1 \lt p_2, \textrm{ and } p_1 \gt p_2$, respectively.
• conf.level is associated with the significance level for the test.
• correct is a logical value (i.e., TRUE or FALSE) that indicates is a "Yates Continuity Correction" should be used. There is a large body of research that suggests this correction is too strict. To perform an uncorrected $z$-test of a proportion (which pools the proportions), specify correct = FALSE to override the default.

As an example of its use, suppose we have two samples of 500 individuals. Everyone in the first sample has lung cancer, while everyone in the second sample is healthy. There are 490 smokers in the first group, while only 400 in the second.

Perform the test in R with:

results = prop.test(x = c(490, 400), n = c(500,500))
results

which results in:
        2-sample test for equality of proportions with continuity
correction

data:  c(490, 400) out of c(500, 500)
X-squared = 80.909, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.1408536 0.2191464
sample estimates:
prop 1 prop 2
0.98   0.80


Note, after running the above, you can access the $p$-value of the test with results$p.value, and the related confidence interval with results$conf.int.