ARTICLE

standard deviation

Standard Deviation Standard deviation is one of the most fundamental measures of dispersion in probability theory and statistics. Formally, it is the square root of the variance an

浏览 0 更新 2026-06-27

Standard Deviation

Standard deviation is one of the most fundamental measures of dispersion in probability theory and statistics. Formally, it is the square root of the variance and quantifies the average distance of data points from the arithmetic mean (or expected value). A low standard deviation indicates that data points cluster tightly around the mean; a high standard deviation signals wide dispersion. The population standard deviation is denoted by $\sigma$ , while the sample counterpart is denoted by $s$ or $SD$ .

Unlike variance, which is expressed in squared units (e.g., dollars-squared), standard deviation restores the original units (e.g., dollars), making it far more interpretable for reporting and decision-making. This property alone explains why standard deviation, rather than variance, is the default measure of variability in virtually every applied discipline.

Computational Formulas

Standard deviation is defined as the non-negative square root of the variance. Two versions exist depending on whether the entire population or only a sample is available.

Population Standard Deviation

\sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}}

Here $\mu$ is the population mean, $N$ is the population size, and the numerator $\sum (x_i - \mu)^2$ is the sum of squared deviations. Dividing by $N$ yields the population variance $\sigma^2$ ; taking the square root gives $\sigma$ .

Sample Standard Deviation

s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}

where $\bar{x}$ is the sample mean and $n-1$ is the degrees of freedom. Dividing by $n-1$ rather than $n$ implements Bessel's correction. This adjustment is necessary because the sample mean $\bar{x}$ is, by construction, the value that minimizes the sum of squared deviations for the sample—making the deviations systematically smaller than they would be around the unknown population mean $\mu$ . Dividing by $n-1$ compensates for this downward bias, ensuring that the sample variance $s^2$ is an unbiased estimator of $\sigma^2$ : $\mathbb{E}[s^2] = \sigma^2$ . Taking the square root of the unbiased variance estimator yields the (slightly biased but consistent) sample standard deviation.

The Random Variable Definition

For a random variable $X$ with mean $\mu = \mathbb{E}[X]$ , the standard deviation is:

\sigma = \sqrt{\mathbb{E}[(X - \mu)^2]}

Using the computational shortcut $\mathbb{E}[(X - \mu)^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ :

\sigma = \sqrt{\mathbb{E}[X^2] - (\mathbb{E}[X])^2}

For a discrete random variable with probability mass function $p(x)$ :

\sigma = \sqrt{ \sum_i (x_i - \mu)^2 \, p(x_i) }

For a continuous random variable with probability density function $f(x)$ :

\sigma = \sqrt{ \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx }

Properties

Let $X$ and $Y$ be random variables and $a, b$ constants.

Non-negativity: $\sigma \ge 0$ . Equality holds if and only if $X$ is almost surely constant.
Translation invariance: $SD(X + a) = SD(X)$ . Shifting all values leaves dispersion unchanged.
Scaling: $SD(aX) = |a| \, SD(X)$ . Multiplying by a constant scales the standard deviation by the absolute value of that constant.
Additivity under independence: If $X$ and $Y$ are independent random variables, then $SD(X + Y) = \sqrt{SD(X)^2 + SD(Y)^2}$ . In general, $SD(X + Y) = \sqrt{SD(X)^2 + SD(Y)^2 + 2 \, \wiki{Cov}(X, Y)}$ .
Relation to the mean: No universal relationship binds the mean and standard deviation; for a given mean, any non-negative standard deviation is possible depending on the distribution. However, for certain families the two are functionally linked: in a Poisson distribution, the mean and variance are equal ( $\mu = \sigma^2$ ), while in a binomial distribution, $\sigma^2 = np(1-p)$ is fully determined by $n$ and $p$ .
Inequality: $SD(X) \le \sqrt{\mathbb{E}[X^2]}$ , with equality when $\mu = 0$ . This follows directly from $\sigma^2 = \mathbb{E}[X^2] - \mu^2 \le \mathbb{E}[X^2]$ .
Pooled standard deviation: When combining two independent samples of sizes $n_1, n_2$ with respective standard deviations $s_1, s_2$ , the pooled standard deviation (assuming equal population variances) is: \[ s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}} \] This is a weighted average of the two variances, used extensively in the two-sample t-test and ANOVA.

Interpretation: The Empirical Rule and Chebyshev's Inequality

The 68-95-99.7 Rule (Normal Distribution)

When data follow a normal distribution, the Empirical Rule provides precise coverage probabilities:

Approximately 68\% of observations fall within $\mu \pm \sigma$ ;
Approximately 95\% fall within $\mu \pm 2\sigma$ ;
Approximately 99.7\% fall within $\mu \pm 3\sigma$ .

Observations beyond $\mu \pm 3\sigma$ are typically flagged as potential outliers. For example, if exam scores are normally distributed with mean 75 and standard deviation 8, roughly 95\% of students score between 59 and 91.

Chebyshev's Inequality

For any distribution (not necessarily normal), Chebyshev's inequality guarantees that for $k > 0$ :

P(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}

Equivalently, at least $1 - 1/k^2$ of the data lies within $k$ standard deviations of the mean. For $k = 2$ , at least 75\% is covered; for $k = 3$ , at least 88.9\%. This bound is distribution-free but far looser than the Empirical Rule; its virtue is absolute generality.

Worked Example

Consider an investment portfolio with annual returns over five years: $-5\%$ , $15\%$ , $25\%$ , $10\%$ , $5\%$ . The sample mean is:

\bar{x} = \frac{-5 + 15 + 25 + 10 + 5}{5} = \frac{50}{5} = 10\%

Squared deviations from the mean: $(-5-10)^2 = 225$ , $(15-10)^2 = 25$ , $(25-10)^2 = 225$ , $(10-10)^2 = 0$ , $(5-10)^2 = 25$ . Their sum is $225 + 25 + 225 + 0 + 25 = 500$ .

s = \sqrt{\frac{500}{5-1}} = \sqrt{125} \approx 11.18\%

The portfolio's average annual return is 10\%, and returns typically deviate from this mean by about 11.18 percentage points. If returns are approximately normal, roughly two-thirds of annual returns should fall between $-1.18\%$ and $21.18\%$ . The relatively high standard deviation relative to the mean signals substantial year-to-year variability, which a risk-averse investor would weigh carefully.

Interpretation via Standardized Scores

The z-score (or standard score) expresses any observation in standard-deviation units:

z = \frac{x - \mu}{\sigma}

A $z$ -score of $+2$ means the observation is two standard deviations above the mean; a $z$ -score of $-1.5$ means it is one and a half below. This transformation is foundational: it allows comparison across variables measured on entirely different scales—for instance, comparing a student's performance in mathematics and verbal reasoning when the two tests have different means and spreads. It is also the first step in constructing $z$ -tests and $t$ -tests in hypothesis testing. In finance, a $z$ -score of $-3$ for a daily return signals an event so extreme that, under normality, it would be expected roughly once every 1,350 trading days; such tail events challenge the normality assumption itself and motivate the use of heavy-tailed distributions in risk management.

Key Applications

Finance: Asset return standard deviation is the canonical measure of volatility and risk. It is a core input to Modern Portfolio Theory, the Capital Asset Pricing Model (CAPM), and the Sharpe ratio $(R_p - R_f)/\sigma_p$ , which measures risk-adjusted return. Option pricing models like Black-Scholes-Merton also depend critically on volatility estimates.
Economics: Standard deviations of GDP growth, inflation, and unemployment rates gauge macroeconomic stability. A country with low GDP-growth standard deviation is said to exhibit smoother business cycles; policymakers often target low output-gap variability.
Quality Control: In Six Sigma methodology, the standard deviation of a manufacturing process determines defect rates. A process operating at Six Sigma quality produces only 3.4 defects per million opportunities, meaning the specification limits are six standard deviations from the process mean.
Social Sciences: Survey results routinely report means accompanied by standard deviations to convey the spread of responses. In psychometrics, test-score standard deviations allow comparison across populations via z-scores: $z = (x - \mu)/\sigma$ .
Machine Learning: Standard deviation underpins feature scaling (standardization), where each feature is transformed to have mean 0 and standard deviation 1 before model training. This prevents features with larger scales from dominating algorithms sensitive to magnitude.

Comparison with Other Dispersion Measures

Variance ( $\sigma^2$ ): Mathematically cleaner for theoretical work (additive under independence), but its squared units hinder interpretation. Standard deviation is almost always preferred for reporting.
Range (max $-$ min): Simple to compute but uses only two data points and is catastrophically sensitive to outliers. A single extreme value can inflate the range regardless of the remaining data.
Interquartile Range (IQR): The difference between the 75th and 25th percentiles. Robust to outliers and preferred when distributions are heavily skewed. However, IQR discards information from the tails beyond the quartiles.
Mean Absolute Deviation (MAD): The average of absolute (not squared) deviations. Less sensitive to outliers than standard deviation and has the same units as the data. Historically, Fisher argued for the standard deviation's greater algebraic tractability and efficiency under normality—reasons it became the dominant measure.
Coefficient of Variation ( $CV = \sigma / \mu$ ): A dimensionless ratio useful for comparing dispersion across datasets with different units or vastly different means.

Limitations

Standard deviation has two principal limitations. First, by squaring deviations it amplifies the influence of outliers; a single extreme observation can dramatically inflate the estimate. In such cases, robust alternatives like IQR or MAD should be considered. Second, standard deviation is most informative when the underlying distribution is roughly symmetric and unimodal. For multi-modal or highly skewed data, a single $\sigma$ may mislead—reporting the full distribution or using quantile-based measures is often more appropriate. Despite these caveats, standard deviation remains the indispensable workhorse of dispersion measurement, combining mathematical tractability with practical interpretability.

关于知经 KNOWECON

知经 KNOWECON 是深圳市卢可教育科技有限公司旗下的教育科技品牌，长期面向北京大学、清华大学、中国人民大学等顶尖院校，提供经济学、金融学、统计学、管理学等相关科目的专业课考研辅导与复试辅导。每年都有数十名同学在我们的帮助下完成系统备考，并成功进入理想院校。

知经主讲人喵喵学长毕业于北京大学汇丰商学院经济学专业和新加坡国立大学金融工程专业，获经济学硕士与金融工程硕士学位。他同时也是软件工程师和教育科技创业者，长期探索用讲义、题库、记忆系统、智能答疑与学习数据工具改善专业课学习体验。

我们相信，好的考研辅导不只是押题和陪跑，更是把复杂知识讲清楚、把复习路径设计清楚，并用技术让学习过程更可追踪、更可反馈、更可坚持。