author

Kien Duong

May 29, 2025

Variance

1. What is Variance?

Variance is a statistical measure that tells you how much the values in a dataset differ from the mean (average) of that dataset. Variance shows how spread out the numbers are.

  • If the variance is low, the numbers are close to the mean.
  • If the variance is high, the numbers are spread out.

\[ \text{Variance} = \sigma^2 = \frac{1}{N} \sum_{i=1}^{N}(x_i – \mu)^2 \]

\[ \mu = \frac{1}{N} \sum_{i=1}^{N} x_i \]

  • \(x_i\): each value in the dataset

  • \(\mu\): population mean

  • \(N\): total number of values in the population

 

2. What is Sample Variance?

Sample Variance measures how spread out the values are in a sample (a subset of the population). It estimates the true variance of the entire population using just the sample data.

\[ \text{Sample Variance}  = s^2 = \frac{1}{n-1} \sum_{i=1}^{n}(x_i – \bar{x})^2 \]

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

  • each value in the dataset

  • \(\bar{x}\): sample mean

  • \(n\) :total number of values in the sample

We use \(n – 1\) instead of \(n\) (Bessel’s correction) to correct bias when estimating the population variance from a sample.

 

3. Bessel’s correction

\[ x_i – \bar{x} = (x_i – \mu) – (\bar{x} – \mu) \]

\[ \Rightarrow (x_i – \bar{x})^2 = \left( (x_i – \mu) – (\bar{x} – \mu) \right)^2 \]

\[ \Rightarrow (x_i – \bar{x})^2 = (x_i – \mu)^2 – 2(x_i – \mu)(\bar{x} – \mu) + (\bar{x} – \mu)^2 \]

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \bar{x})^2 = \sum_{i=1}^{n}(x_i – \mu)^2 – 2(\bar{x} – \mu)\sum_{i=1}^{n}(x_i – \mu) + n(\bar{x} – \mu)^2 ~~~~~ (3.1) \]

Use the sample mean definition, we have:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n}x_i \]

\[ \Rightarrow \sum_{i=1}^{n}x_i = n\bar{x} \]

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \mu) = \left(\sum_{i=1}^{n}x_i\right) – n\mu \]

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \mu) = n\bar{x} – n\mu \]

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \mu) = n(\bar{x} – \mu) \]

From (3.1)

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \bar{x})^2 = \sum_{i=1}^{n}(x_i – \mu)^2 – 2n(\bar{x} – \mu)^2 + n(\bar{x} – \mu)^2 \]

\[ \Rightarrow \sum_{i=1}^{n}(x_i – \bar{x})^2 = \sum_{i=1}^{n}(x_i – \mu)^2 – n(\bar{x} – \mu)^2 \]

Based on the properties of expected value, take expectation of both sides:

\[ \Rightarrow \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \bar{x})^2\right] = \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \mu)^2\right] – \mathbb{E}\left[n(\bar{x} – \mu)^2\right] ~~~~~ (3.2) \]

From this post, we have the definition of expected value & variance:

\[ \mathbb{E}[X] = \sum_{i=1}^{n} x_i \cdot P(X = x_i) ~~~~~ (3.3) \]

\[ \text{Var}(X) = \sum_{i=1}^n (x_i – E(X))^2 P(X = x_i) ~~~~~ (3.4) \]

From (3.3), we have:

\[ \Rightarrow \mathbb{E}[g(X)] = \sum_{i=1}^n g(x_i) \cdot P(X = x_i) \]

If we let \(g(X) = (X – \mu)^2\), then:

\[ \Rightarrow \mathbb{E}[(X – \mu)^2] = \sum_{i=1}^n (x_i – \mu)^2 \cdot P(X = x_i) \]

\[ \Rightarrow \text{Var}(X) = \mathbb{E}[(X – \mu)^2] \]

From (3.2), we have

  • First term:

\[ \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \mu)^2\right] = \sum_{i=1}^{n} \mathbb{E}[(x_i – \mu)^2] \]

\[ \Rightarrow \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \mu)^2\right] = \sum_{i=1}^{n} \text{Var}(X) \]

\[ \Rightarrow \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \mu)^2\right] = \sum_{i=1}^{n} \sigma^2 = n \cdot \sigma^2 \]

  • Second term:

\[ \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \mathbb{E}\left[(\bar{x} – \mu)^2\right] \]

\[ \Rightarrow \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \text{Var}(\bar{x}) \]

Apply (2) we have:

\[ \Rightarrow \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \text{Var}\left(\frac{1}{n} \sum_{i=1}^{n} x_i\right) \]

Based on the property of variance \( \text{Var}(aX) = a^2\text{Var}(X) \) when \(a\) is constant:

\[ \Rightarrow \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \frac{1}{n^2} \text{Var}\left(\sum_{i=1}^{n} x_i\right) \]

\[ \Rightarrow \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \frac{1}{n^2} \sum_{i=1}^{n} \text{Var}(x_i) \]

\[ \Rightarrow \mathbb{E}\left[n(\bar{x} – \mu)^2\right] = n \cdot \frac{1}{n^2} \cdot n \cdot \sigma^2 = \sigma^2 \]

Now, (3.2) can be transformed to:

\[ \Rightarrow \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \bar{x})^2\right] = n \cdot \sigma^2 – \sigma^2 \]

\[ \Rightarrow \mathbb{E}\left[\sum_{i=1}^{n}(x_i – \bar{x})^2\right] = (n – 1) \cdot \sigma^2 \]

\[ \Rightarrow \mathbb{E}\left[\frac{1}{n – 1} \sum_{i=1}^{n}(x_i – \bar{x})^2\right] = \sigma^2 \]

Recent Blogs