Propensity Score Matching

Propensity Score Matching is a statistical technique used to reduce selection bias by matching individuals from different groups based on similar characteristics. It attempts to simulate the conditions of a randomized experiment.

Propensity Score

A propensity score is the probability that an individual receives the treatment given a set of observed covariates.

Mathematically, the propensity score is defined as:

e(X) = P(T = 1 \mid X)

Where:

T = 1 indicates that the individual is in the treatment group.
𝑋 is the vector of observed covariates.
𝑒(𝑋) is the propensity score.

Uses of Propensity Score Matching

Problem with Observational Data

In randomized experiments, treatment is assigned randomly.
In observational data, treatment is not random—people may self-select based on age, income, education, etc.

This causes confounding, which biases the treatment effect.

Solution: Match Based on Propensity Scores

Rather than matching individuals exactly on every covariate (which can be impossible with many variables), match them based on the propensity score—a single number summarizing covariate information.

Steps in Propensity Score Matching

1. Model the Propensity Score

Use logistic regression (or another classification model) to estimate each individual’s probability of receiving the treatment.

2. Match Individuals

Match treated and untreated individuals based on similar propensity scores. Several matching methods are used (explained below).

3. Check Balance

Evaluate whether the matched groups have similar covariate distributions.

4. Estimate Treatment Effect

After matching, estimate the average treatment effect by comparing outcomes between the groups.

Propensity Score Estimation

The most common method for estimating propensity scores is logistic regression.

\log \left( \frac{e(X)}{1 - e(X)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_k X_k

Where:

𝑋𝑖 are covariates (e.g., age, income).
𝛽𝑖 are coefficients estimated from the data.

Types of Matching Methods

Nearest Neighbor Matching- Match each treated individual with an untreated individual with the closest propensity score.
Caliper Matching- Matches are made only if the propensity score difference is within a certain threshold (e.g., 0.01).
Radius Matching- Similar to caliper but allows multiple matches within a radius.
Kernel Matching- Uses a weighted average of untreated individuals with weights decreasing with distance in propensity score.
Stratification Matching- Divide the range of propensity scores into intervals (strata) and compare outcomes within each.