Heteroscedasticity in Regression Analysis

Heteroscedasticity refers to a violation of one of the key assumptions of linear regression constant variance of the error term. In an ideal regression model, residuals should be randomly scattered with equal spread (homoscedasticity). However, when the variance of residuals increases or decreases with the fitted values or predictor variables, the model becomes heteroscedastic. This affects the reliability of statistical inference, leading to inefficient estimates and invalid hypothesis tests.

Key Assumptions of Linear Regression

Errors have zero mean.
Errors have constant variance (homoscedasticity).
Errors are independent (no autocorrelation).
Errors follow a normal distribution.

Homoscedasticity vs Heteroscedasticity

Lets compare Homoscedasticity and Heteroscedasticity,

Feature	Homoscedasticity	Heteroscedasticity
Residual Variance	Constant across observations	Varies with predictors or fitted values
Regression Assumptions	Satisfied	Violated
OLS Efficiency	Efficient (BLUE)	Inefficient
Hypothesis Tests	Valid t-test & F-test	Invalid or biased test results
Residual Plot Pattern	Random scatter	Funnel-shaped or patterned
Need for Correction	No	Yes – may require WLS, transformation or robust errors

Reasons for Heteroscedasticity

Large variation between smallest and largest values (presence of outliers).
Incorrect model specification, such as missing variables or wrong functional form.
Mixing observations from different measurement scales.
Using incorrect transformations while preprocessing.
Skewed distribution of one or more independent variables.
Natural growth processes (e.g., income vs expenditure).

Effects

OLS estimators remain unbiased, but they are no longer efficient (not minimum variance).
Violates the BLUE (Best Linear Unbiased Estimator) property.
Standard errors become incorrect → t-tests and F-tests become unreliable.
Confidence intervals may become too wide or too narrow.
Model interpretations become misleading.

Identifying Heteroscedasticity

1. Graphical Method (Residual Plots)

Residual diagnostics are often the quickest way to spot heteroscedasticity. The most common plot is Residuals vs Fitted Values, where the residuals should appear randomly scattered. A model has heteroscedasticity when the plot shows:

Funnel or cone shapes: the spread of residuals increases or decreases as fitted values grow.
Systematic patterns: curved bands, clusters or waves instead of uniform scattering.
Non-constant scatter across ranges of the predictor variables.
Residual variation linked to specific groups in the data, indicating different variance levels across segments.

These visual cues indicate that the error variance is not constant and violates regression assumptions.

2. Statistical Tests for Heteroscedasticity

Graphical checks are intuitive, but statistical tests provide formal evidence.

1. Breusch–Pagan (BP) Test

Evaluates whether residual variance is related to the predictors.
The squared residuals are regressed on the independent variables.
A significant test statistic indicates that error variance changes systematically with predictors.

2. White Test

A more general test that does not assume any specific pattern of heteroscedasticity. Uses an auxiliary regression where squared residuals are regressed on:

original predictors,
their squares,
and their cross-products.

Detects both linear and nonlinear forms of heteroscedasticity. These tests help confirm heteroscedasticity even when visual patterns are subtle or ambiguous.

Corrections

Respecify the model (add missing variables, remove unnecessary ones).
Apply suitable transformations (log, square-root, Box–Cox).
Use Weighted Least Squares (WLS) where weights compensate for variance differences.
Use Robust Standard Errors (e.g., White’s correction) to fix inference problems without changing coefficients.
Apply MINQUE for unbiased variance estimation using quadratic estimators.