Loading [MathJax]/jax/output/HTML-CSS/jax.js
     

Bessel's Correction

When finding the variance of a population of size N, we know to compute

σ2=(xμ)2N where the summation is taken over all members x of the population, and μ is the population mean.

However, if we are attempting to estimate σ2 using a sample, it turns out that simply replacing the population size N with the sample size n, replacing the population mean μ with the sample mean ¯x, and summing over all members x of the sample (instead of the population) yields a biased estimate of σ2.

s2biased=(x¯x)2n

To intuitively see this, consider the extreme case where the sample size is n=1, with the lone value in the sample being x0. In this case, ¯x=x0, making sbiased=0. Unless the population consisted of N identical values (which is highly unlikely), estimating the population variance with 0 is clearly an underestimate.

In the more general case, note that the sample mean is not the same as the population mean. One's sample observations are naturally going to be closer on average to the sample mean than the population mean, resulting in the average (x¯x)2 value underestimating the average (xμ)2 value. Thus, s2biased generally underestimates σ2 -- with the difference between the two more pronounced when the sample size is small.

The good news is that this bias can be corrected!

However, the argument below to show this is a bit involved.

Before we begin this argument, let us make a couple of observations:

First, suppose that we randomly draw a sample of the form {x1,x2,,xn} from a population with mean μ. We can quickly show that E[¯x]=μ, using the properties of the expected value, as seen below: E[¯x]=E[x1+x2++xnn]=1nE[x1+x2++xn]=1n(E[x1]+E[x2]+E[xn])=1n(μ+μ++μ) ...where μ appears n times=1nnμ=μ

Second, under the additional assumption that the population discussed above has variance σ2, we can very similarly show that Var[¯x]=σ2/n.

Recall that Var[¯x]=Var[x1+x2++xnn]=1n2Var[x1+x2++xn]=1n2(Var[x1]+Var[x2]++Var[xn])=1n2(σ2+σ2++σ2) ...where σ2 appears n times=1n2nσ2=σ2n

Having these results under our belt, we can turn our attention to the main argument...

We intend to show that E[s2biased]=(n1n)σ2

With the right side not being simply σ2, we establish the biased nature of s2biased while simultaneously determining a factor to correct this bias.

As argument for our claim, consider the following:

E[s2biased]=E[1nni=1(xi¯x)2]=E[1nni=1[(xiμ)(¯xμ)]2]=E[1nni=1[(xiμ)22(¯xμ)(xiμ)+(¯xμ)2]]=E[1nni=1(xiμ)22(¯xμ)nni=1(xiμ)+1nni=1(¯xμ)2]=E[1nni=1(xiμ)22(¯xμ)2+1nni=1(¯xμ)2]=E[1nni=1(xiμ)22(¯xμ)2+1nn(¯xμ)2]=E[1nni=1(xiμ)22(¯xμ)2+(¯xμ)2]=E[1nni=1(xiμ)2(¯xμ)2]=E[1nni=1(xiμ)2]E[(¯xμ)2]=1nni=1E[(xiμ)2]E[(¯xμ)2]=1nni=1σ2E[(¯xμ)2]=1nnσ2E[(¯xμ)2]=σ2E[(¯xμ)2]=σ2Var[¯x]=σ2σ2n=(n1n)σ2

Again, having established that E[s2biased]=(n1n)σ2 we can quickly construct an unbiased estimator, s2, for σ2 by multiplying s2biased by n/(n+1), yielding s2=ni=1(x¯x)2n1 The unbiased nature of s2 can be quickly confirmed by observing the following: E[s2]=E[ni=1(x¯x)2n1]=1n1E[ni=1(x¯x)2]=nn1E[1nni=1(x¯x)2]=nn1E[s2biased]=(nn1)(n1n)σ2=σ2