Variance

For samples of a variate having a distribution with known Mean $\mu$ , the ``population variance'' (usually called ``variance'' for short, although the word ``population'' should be added when needed to distinguish it from the Sample Variance) is defined by

$\displaystyle \mathop{\rm var}\nolimits (x)$	$\textstyle \equiv$	$\displaystyle {1\over N} \sum(x-\mu)^2 = \left\langle{x^2-2\mu x+\mu^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{x^2}\right\rangle{} -\left\langle{2\mu x}\right\rangle{}+\left\langle{\mu^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{x^2}\right\rangle{}-2\mu\left\langle{x}\right\rangle{}+\mu^2,$	(1)

where

$\begin{displaymath} \left\langle{x}\right\rangle{}\equiv {1\over N} \sum_{i=1}^N x_i. \end{displaymath}$

(2)

But since $\left\langle{x}\right\rangle{}$ is an Unbiased Estimator for the Mean

$\begin{displaymath} \mu \equiv \left\langle{x}\right\rangle{}, \end{displaymath}$

(3)

it follows that the variance

$\begin{displaymath} \sigma^2\equiv\mathop{\rm var}\nolimits (x) = \left\langle{x^2}\right\rangle{}-\mu^2. \end{displaymath}$

(4)

The population Standard Deviation is then defined as

$\begin{displaymath} \sigma \equiv \sqrt{\mathop{\rm var}\nolimits (x)} =\sqrt{\left\langle{x^2}\right\rangle{}-\mu^2}. \end{displaymath}$

(5)

A useful identity involving the variance is

$\begin{displaymath} \mathop{\rm var}\nolimits (f(x)+g(x)) = \mathop{\rm var}\nolimits (f(x))+\mathop{\rm var}\nolimits (g(x)). \end{displaymath}$

(6)

Therefore,

$\displaystyle \mathop{\rm var}\nolimits (ax+b)$	$\textstyle =$	$\displaystyle \left\langle{[(ax+b)-\left\langle{ax+b}\right\rangle{} ]^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{(ax+b-a\left\langle{x}\right\rangle{} -b)^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{(ax-a\mu)^2}\right\rangle{} = \left\langle{a^2(x-\mu)^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle a^2\left\langle{(x-\mu)^2}\right\rangle{} = a^2\mathop{\rm var}\nolimits (x)$	(7)
$\displaystyle \mathop{\rm var}\nolimits (b)$	$\textstyle =$	$\displaystyle 0.$	(8)

If the population Mean is not known, using the sample mean $\bar x$ instead of the population mean $\mu$ to compute

$\begin{displaymath} s^2\equiv \hat\sigma^2_N \equiv {1\over N} \sum_{i=1}^N (x_i-\bar x)^2 \end{displaymath}$

(9)

gives a Biased Estimator of the population variance. In such cases, it is appropriate to use a Student's t-Distribution instead of a Gaussian Distribution. However, it turns out (as discussed below) that an Unbiased Estimator for the population variance is given by

$\begin{displaymath} s'^2\equiv \hat\sigma'^2_{N}\equiv {1\over N-1} \sum_{i=1}^N (x_i-\bar x)^2. \end{displaymath}$

(10)

The Mean and Variance of the sample standard deviation for a distribution with population mean $\mu$ and Variance are

$\displaystyle \mu_{{s_N}^2}$	$\textstyle =$	$\displaystyle {N-1\over N} s^2$	(11)
$\displaystyle {\sigma_{{s_N}^2}}^2$	$\textstyle =$	$\displaystyle {N-1\over N^3} [(N-1)\mu_4-(N-3){\mu_2}^2].$	(12)

The quantity $N{s_N}^2/\sigma^2$ has a Chi-Squared Distribution.

For multiple variables, the variance is given using the definition of Covariance,

$\displaystyle \mathop{\rm var}\nolimits \left({ \sum_{i=1}^n x_i}\right)$	$\textstyle =$	$\displaystyle \mathop{\rm cov}\nolimits \left({\,\sum_{i=1}^n x_i, \sum_{j=1}^m x_j}\right)$
	$\textstyle =$	$\displaystyle \sum_{i=1}^n \sum_{j=1}^m \mathop{\rm cov}\nolimits (x_i,x_j)$
	$\textstyle =$	$\displaystyle \sum_{i=1}^n \sum_{j=1 \atop j=i}^m \mathop{\rm cov}\nolimits (x_... ... + \sum_{i=1}^n \sum_{j=1 \atop j\not= i}^m \mathop{\rm cov}\nolimits (x_i,x_j)$
	$\textstyle =$	$\displaystyle \sum_{i=1}^n \mathop{\rm cov}\nolimits (x_i,x_j) + \sum_{i=1}^n \sum_{j=1\atop j\not = i}^m \mathop{\rm cov}\nolimits (x_i,x_j)$
	$\textstyle =$	$\displaystyle \sum_{i=1}^n \mathop{\rm var}\nolimits (x_i) + 2\sum_{i=1}^n \sum_{j=i+1}^m \mathop{\rm cov}\nolimits (x_i,x_j).$
			(13)

A linear sum has a similar form:

$\mathop{\rm var}\nolimits \left({\,\sum_{i=1}^n a_ix_i}\right)= \mathop{\rm cov}\nolimits \left({\sum_{i=1}^n a_ix_i, \sum_{j=1}^m a_jx_j}\right)$
$= \sum_{i=1}^n \sum_{j=1}^m a_ia_j\mathop{\rm cov}\nolimits (x_i,x_j)$
$= \sum_{i=1}^n {a_i}^2\mathop{\rm var}\nolimits (x_i) + 2\sum_{i=1}^n \sum_{j=i+1}^m a_ia_j\mathop{\rm cov}\nolimits (x_i,x_j).\quad$	(14)

These equations can be expressed using the Covariance Matrix.

To estimate the population Variance from a sample of elements with a priori unknown Mean (i.e., the Mean is estimated from the sample itself), we need an Unbiased Estimator for $\sigma$ . This is given by the k-Statistic , where

$\begin{displaymath} k_2 = {N\over N-1} m_2 \end{displaymath}$

(15)

and $m_2\equiv s^2$ is the Sample Variance

$\begin{displaymath} s^2\equiv {1\over N}\sum_{i=1}^N (x_i-\bar x)^2. \end{displaymath}$

(16)

Note that some authors prefer the definition

$\begin{displaymath} s'^2\equiv {1\over N-1}\sum_{i=1}^N (x_i-\bar x)^2, \end{displaymath}$

(17)

since this makes the sample variance an Unbiased Estimator for the population variance.

When computing numerically, the Mean must be computed before can be determined. This requires storing the set of sample values. It is possible to calculate using a recursion relationship involving only the last sample as follows. Here, use $\mu_j$ to denote $\mu$ calculated from the first samples (not the th Moment)

$\begin{displaymath} \mu_j\equiv {\,\sum_{i=1}^j x_i\over j}, \end{displaymath}$

(18)

and ${s_j}^2$ denotes the value for the sample variance

calculated from the first

samples. The first few values calculated for the Mean are

$\displaystyle \mu_1$	$\textstyle =$	$\displaystyle x_1$	(19)
$\displaystyle \mu_2$	$\textstyle =$	$\displaystyle {1\cdot \mu_1+x_2\over 2}$	(20)
$\displaystyle \mu_3$	$\textstyle =$	$\displaystyle {2\mu_2+x_3\over 3}.$	(21)

Therefore, for

, 3 it is true that

$\begin{displaymath} \mu_j={(j-1)\mu_{j-1}+x_j\over j}. \end{displaymath}$

(22)

Therefore, by induction,

$\displaystyle \mu_{j+1}$	$\textstyle =$	$\displaystyle {[(j+1)-1]\mu_{(j+1)-1}+x_{j+1}\over j+1}$
	$\textstyle =$	$\displaystyle {j\mu_j+x_{j+1}\over j+1}$	(23)
$\displaystyle \mu_{j+1}(j+1)$	$\textstyle =$	$\displaystyle (j+1)\mu_j+(x_{j+1}-\mu_j)$	(24)
$\displaystyle \mu_{j+1}$	$\textstyle =$	$\displaystyle \mu_j+{x_{j+1}-\mu_j\over j+1},$	(25)

and

$\begin{displaymath} {s_j}^2 = {\sum_{i=1}^j (x_i-\mu_j)^2\over j-1} \end{displaymath}$

(26)

for $j\geq 2$ , so

$\displaystyle j{s_{j+1}}^2$	$\textstyle =$	$\displaystyle j {\sum_{i=1}^{j+1} (x_i-\mu_{j+1})^2\over j} = \sum_{i=1}^{j+1} (x_i-\mu_{j+1})^2$
	$\textstyle =$	$\displaystyle \sum_{i=1}^{j+1} [(x_i-\mu_j)(\mu_j-\mu_{j+1})]^2$
	$\textstyle =$	$\displaystyle \sum_{i=1}^{j+1} (x_i-\mu_j)^2+\sum_{i=1}^{j+1} (\mu_j-\mu_{j+1})^2+2\sum_{i=1}^{j+1}(x_i-\mu_j)(\mu_j-\mu_{j+1}).$	(27)

Working on the first term,

$\displaystyle \sum_{i=1}^{j+1} (x_i-\mu_j)^2$	$\textstyle =$	$\displaystyle \sum_{i=1}^j (x_i-\mu_j)^2+(x_{j+1}-\mu_j)^2$
	$\textstyle =$	$\displaystyle (j-1){s_j}^2+(x_{j+1}-\mu_j)^2.$	(28)

Use (24) to write

$\begin{displaymath} x_{j+1}-\mu_j=(j+1)(\mu_{j+1}-\mu_j), \end{displaymath}$

(29)

$\begin{displaymath} \sum_{i=1}^{j+1} (x_i-\mu_j)^2=(j-1){s_j}^2+(j+1)^2(\mu_{j+1}-\mu_j)^2. \end{displaymath}$

(30)

Now work on the second term in (27),

$\begin{displaymath} \sum_{i=1}^{j+1} (\mu_j-\mu_{j+1})^2 = (j+1)(\mu_j-\mu_{j+1})^2. \end{displaymath}$

(31)

Considering the third term in (27),

$\sum_{i=1}^{j+1} (x_i-\mu_j)(\mu_j-\mu_{j+1}) = (\mu_j-\mu_{j+1}) \sum_{i=1}^{j+1} (x_i-\mu_j)$
$= (\mu_j-\mu_{j+1}) \left[{\sum_{i=1}^j (x_i-\mu_j)+(x_{j+1}-\mu_j)}\right]$
$= (\mu_j-\mu_{j+1})\left({x_{j+1}-\mu_j-j\mu_j +\sum_{i=1}^j x_i}\right).\quad$	(32)

But

$\begin{displaymath} \sum_{i=1}^j x_i=j\mu_j, \end{displaymath}$

(33)

$\sum_{i=1}^{j+1} (\mu_j-\mu_{j+1})(x_{j+1}-\mu_j)$
$=\sum_{i=1}^{j+1} (\mu_j-\mu_{j+1}) (j+1)(\mu_{j+1}-\mu_j)$
$= -(j+1)(\mu_j-\mu_{j+1})^2.\quad$	(34)

Plugging (30), (31), and (34) into (27),

$\displaystyle j{s_{j+1}}^2$	$\textstyle =$	$\displaystyle [(j-1){s_j}^2+(j+1)^2(\mu_{j+1}-\mu_j)^2]$
	$\textstyle \phantom{=}$	$\displaystyle +[(j+1)(\mu_j-\mu_{j+1}) +2[-(j+1)(\mu_j-\mu_{j+1})]$
	$\textstyle =$	$\displaystyle (j-1){s_j}^2+(j+1)^2(\mu_{j+1}-\mu_j)^2$
	$\textstyle \phantom{=}$	$\displaystyle -(j+1)(\mu_j-\mu_{j+1})^2$
	$\textstyle =$	$\displaystyle (j-1){s_j}^2+(j+1)[(j+1)-1](\mu_{j+1}-\mu_j)^2$
	$\textstyle =$	$\displaystyle (j-1){s_j}^2+j(j+1)(\mu_{j+1}-\mu_j)^2,$	(35)

$\begin{displaymath} {s_{j+1}}^2 = \left({1-{1\over j}}\right){s_j}^2+(j+1)(\mu_{j+1}-\mu_j)^2. \end{displaymath}$

(36)

To find the variance of itself, remember that

$\begin{displaymath} \mathop{\rm var}\nolimits (s^2)\equiv\left\langle{s^4}\right\rangle{}-\left\langle{s^2}\right\rangle{}^2, \end{displaymath}$

(37)

and

$\begin{displaymath} \left\langle{s^2}\right\rangle{}={N-1\over N}\mu_2. \end{displaymath}$

(38)

Now find $\left\langle{s^4}\right\rangle{}$ .

$\displaystyle \left\langle{s^4}\right\rangle{}$	$\textstyle =$	$\displaystyle \left\langle{(s^2)^2}\right\rangle{} = \left\langle{(\left\langle{x^2}\right\rangle{}-\left\langle{x}\right\rangle{}^2)^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{\left[{{1\over N}\sum {x_i}^2-\left({{1\over N} \sum x_i}\right)^2}\right]^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle {1\over N^2} \left\langle{\left({\sum x_i}\right)^2}\right\rangle... ...t\rangle{}+{1\over N^4} \left\langle{\left({\sum x_i}\right)^4}\right\rangle{}.$	(39)

Working on the first term of (39),

$\displaystyle \left\langle{\left({\,\sum {x_i}^2}\right)^2}\right\rangle{}$	$\textstyle =$	$\displaystyle \left\langle{\sum {x_i}^4+\sum {x_i}^2{x_j}^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle \left\langle{\sum {x_i}^4}\right\rangle{} +\left\langle{\sum{x_i}^2{x_j}^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle N\left\langle{{x_i}^4}\right\rangle{} +N(N-1)\left\langle{{x_i}^2}\right\rangle{}\left\langle{{x_j}^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle N\mu'_4+N(N-1){\mu'_2}^2.$	(40)

The second term of (39) is known from k-Statistic,

$\begin{displaymath} \left\langle{\sum {x_i}^2\left({\sum x_j}\right)^2}\right\rangle{} = N\mu'_4+N(N-1){\mu'_2}^2, \end{displaymath}$

(41)

as is the third term,

$\displaystyle \left\langle{\left({\sum {x_i}}\right)^4}\right\rangle{}$	$\textstyle =$	$\displaystyle N\left\langle{\sum {x_i}^4}\right\rangle{} +3N(N-1)\left\langle{\sum {x_i}^2{x_j}^2}\right\rangle{}$
	$\textstyle =$	$\displaystyle N\mu'_4+3N(N-1){\mu'_2}^2.$	(42)

Combining (39)-(42) gives

$\displaystyle \left\langle{s^4}\right\rangle{}$	$\textstyle =$	$\displaystyle {1\over N^2} [N\mu'_4+N(N-1){\mu'_2}^2]-{2\over N^3} [N\mu'_4+N(N-1){\mu'_2}^2]$
	$\textstyle \phantom{=}$	$\displaystyle \mathop{+} {1\over N^4} [N\mu'_4+3N(N-1){\mu'_2}^2]$
	$\textstyle =$	$\displaystyle \left({{1\over N}-{2\over N^2}+{1\over N^3}}\right)\mu'_4$
	$\textstyle \phantom{=}$	$\displaystyle +\left[{{N-1\over N}-{2(N-1)\over N^2}+{3(N-1)\over N^3}}\right]{\mu'_2}^2$
	$\textstyle =$	$\displaystyle \left({N^2-2N+1\over N^3}\right)\mu'_4+{(N-1)(N^2-2N+3)\over N^3} {\mu'_2}^2$
	$\textstyle =$	$\displaystyle {(N-1)[(N-1)\mu'_4+(N^2-2N+3){\mu'_2}^2]\over N^3},$	(43)

so plugging in (38) and (43) gives

$\displaystyle \mathop{\rm var}\nolimits (s^2)$	$\textstyle =$	$\displaystyle \left\langle{s^4}\right\rangle{}-\left\langle{s^2}\right\rangle{}^2$
	$\textstyle =$	$\displaystyle {(N-1)[(N-1)\mu'_4+(N^2-2N+3){\mu'_2}^2]\over N^3}-{(N-1)^2N\over N^3}{\mu'_2}^2$
	$\textstyle =$	$\displaystyle {N-1\over N^3} \{(N-1)\mu'_4+[(N^2-2N+3)-N(N-1)]{\mu'_2}^2\}$
	$\textstyle =$	$\displaystyle {(N-1)[(N-1)\mu'_4-(N-3){\mu'_2}^2] \over N^3}.$	(44)

Student calculated the Skewness and Kurtosis of the distribution of

$\displaystyle \gamma_1$	$\textstyle =$	$\displaystyle \sqrt{8\over N-1}$	(45)
$\displaystyle \gamma_2$	$\textstyle =$	$\displaystyle {12\over N-1}$	(46)

and conjectured that the true distribution is Pearson Type III Distribution

$\begin{displaymath} f(s^2) = C(s^2)^{(N-3)/2}e^{-Ns^2/2\sigma^2}, \end{displaymath}$

(47)

where

$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle {Ns^2\over N-1}$	(48)
$\displaystyle C$	$\textstyle =$	$\displaystyle {\left({N\over 2\sigma^2}\right)^{(N-1)/2}\over \Gamma\left({{\textstyle{N-1\over 2}}}\right)}.$	(49)

This was proven by R. A. Fisher.

The distribution of itself is given by

$\begin{displaymath} f(s)=2 {\left({N\over 2\sigma^2}\right)^{(N-1)/2}\over \Gamm... ...({{\textstyle{N-1\over 2}}}\right)} e^{-ns^2/2\sigma^2}s^{N-2} \end{displaymath}$

(50)

$\begin{displaymath} \left\langle{s}\right\rangle{}=\sqrt{2\over N} {\Gamma\left(... ...t({{\textstyle{N-1\over 2}}}\right)} \sigma\equiv b(N) \sigma, \end{displaymath}$

(51)

where

$\begin{displaymath} b(N)\equiv\sqrt{2\over N} {\Gamma\left({{\textstyle{N\over 2}}}\right)\over \Gamma\left({{\textstyle{N-1\over 2}}}\right)}. \end{displaymath}$

(52)

The Moments are given by

$\begin{displaymath} \mu_r = \left({2\over N}\right)^{r/2} {\Gamma\left({{\textst... ...\over \Gamma\left({{\textstyle{N-1\over 2}}}\right)} \sigma^r, \end{displaymath}$

(53)

and the variance is

$\displaystyle \mathop{\rm var}\nolimits (s)$	$\textstyle =$	$\displaystyle \nu_2-{\nu_1}^2 = {N-1\over N} \sigma^2-[b(N)\sigma]^2$
	$\textstyle =$	$\displaystyle {1\over N} \left[{N-1-{2\Gamma^2\left({{\textstyle{N\over 2}}}\right)\over \Gamma^2\left({{\textstyle{N-1\over 2}}}\right)}\sigma^2}\right].$	(54)

An Unbiased Estimator of $\sigma$ is

. Romanovsky showed that

$\begin{displaymath} b(N)=1-{3\over 4N}-{7\over 32N^2}-{139\over 51849N^3}+\ldots. \end{displaymath}$

(55)

References

Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. ``Moments of a Distribution: Mean, Variance, Skewness, and So Forth.'' §14.1 in Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 604-609, 1992.