Causal Inference in Marketing: Moving Beyond Correlation with Propensity Scores and Instrumental Variables

Most marketing analysis relies on correlation, not causation. A campaign runs, sales increase, and we assume the campaign worked. But often, that is not true. The timing overlaps, but the cause may be something else entirely. When A/B testing is not possible, I use causal inference techniques like propensity score matching and instrumental variables to estimate what would have happened if we had done something differently. In this article, I explain how to separate signal from noise and measure true effect, not just association.

Introduction: Why Correlation Is Not Enough

One of the most common mistakes I see in marketing analytics is the confusion of correlation with causation. A campaign runs. Sales increase. The natural assumption is that the campaign caused the growth. But often, that is not true. The timing overlaps, but the cause may be something else entirely. And if you optimise based on false assumptions, you risk wasting budget, misallocating credit, and building strategy on sand.

When A/B testing is possible, this problem becomes manageable. But many business questions cannot be tested experimentally, for legal, operational or ethical reasons. This is where causal inference comes in. Causal inference gives us mathematical and statistical tools to estimate what would have happened if we had done something differently. It allows us to separate signal from noise and measure true effect, not just association.

This article walks through how I use techniques like propensity score matching, instrumental variables, and graphical models (DAGs) to estimate causal impact in real marketing work.

The Causal Problem: What We Want to Know

Suppose I run a loyalty email campaign to a group of existing customers. At the end of the week, this group spends more than the group that did not receive the email. But was it the email that made the difference?

Maybe the more active customers were more likely to receive the email in the first place. Maybe they were already planning to buy. Maybe the email nudged them, or maybe it did nothing. The difference in spend could be due to underlying differences in customer type.

This is the fundamental question of potential outcomes. What we want to estimate is:

ATE=E[Y(1)]E[Y(0)]\text{ATE} = E[Y(1)] - E[Y(0)]

Where:

  • Y(1)Y(1) is the outcome if treated (e.g. received email)
  • Y(0)Y(0) is the outcome if not treated
  • E[]E[\cdot] denotes expected value

We can observe one of these for each individual, but never both. That is the fundamental problem of causal inference. We must estimate the counterfactual.

Rubin Causal Model: The Formal Framework

The Rubin Causal Model provides a formal way to think about this. Each unit (e.g. customer) has two potential outcomes: one under treatment, one under control. The treatment effect is the difference, but we only see one outcome per unit.

If treatment is random, the average treatment effect (ATE) is easy to estimate. But in marketing, treatment is rarely random. Customers self select, or marketers target based on behaviour. So we must account for this non random assignment.

Propensity Score Matching: Balancing the Covariates

One solution is to model selection probability using observable characteristics. This is known as the propensity score.

We define:

e(x)=P(T=1X=x)e(x) = P(T = 1 \mid X = x)

Where:

  • TT is the treatment assignment (1 if treated, 0 if control)
  • XX is the vector of observed covariates (e.g. frequency, spend, geography)
  • e(x)e(x) is the estimated probability of being treated

I estimate this using logistic regression or machine learning models. Then I match treated and untreated units with similar propensity scores. This balances the distribution of covariates across groups.

I usually check the balance before and after matching by looking at standardised mean differences. A well matched sample should resemble a randomised trial.

The treatment effect is then estimated as:

ATE^=1ni=1n(YiTYiC)\hat{\text{ATE}} = \frac{1}{n} \sum_{i=1}^{n} (Y_i^T - Y_i^C)

Where YiTY_i^T and YiCY_i^C are outcomes of matched pairs.

Practical Example: eCommerce Email Campaign

For a DTC fashion brand, I needed to measure the true effect of a personalisation email campaign. The raw data showed a 22% lift in conversion for email recipients. But recipients were pre selected based on past engagement.

GroupRaw ConversionAfter PSM Matching
Received Email8.4%6.1%
No Email5.2%5.3%
Apparent Lift+61%+15%

After propensity score matching, the true lift was closer to 15%. Still significant, but the raw comparison massively overstated the effect.

Instrumental Variables: When Selection Bias Is Hidden

Sometimes, observable covariates are not enough. What if the reason for being treated is based on something unobservable (like motivation or intent)? This breaks the matching approach. In that case, I look for an instrumental variable (IV).

An IV is a variable ZZ that affects treatment assignment TT but has no direct effect on the outcome YY, except through TT.

For example, suppose some customers get a promotion email only because their location has a different email sending time. Location becomes the instrument: it predicts treatment but should not affect purchase directly.

I use two stage least squares (2SLS):

  1. Regress TT on ZZ: estimate predicted treatment T^\hat{T}
  2. Regress YY on T^\hat{T}: estimate effect of treatment using only the variation caused by ZZ

This isolates the exogenous part of treatment, the part not driven by selection bias.

Practical Example: SaaS Onboarding

For a SaaS client, users who completed onboarding had higher retention. But was it because onboarding worked, or because motivated users completed it?

I used server queue timing as an instrument. Some users received onboarding prompts earlier due to infrastructure timing, unrelated to their intent. This revealed that the true causal lift from onboarding was about half what the raw comparison suggested.

DAGs and Graphical Thinking

To make sense of these relationships, I use Directed Acyclic Graphs (DAGs). These are visual maps of how variables relate.

  • Arrows represent causal relationships
  • Nodes represent variables
  • No cycles allowed

I use DAGs to decide what to control for, what to instrument, and where bias may creep in. I use software like dagitty.net to explore conditional independencies.

The Three Key Questions DAGs Answer

QuestionWhat It Reveals
What should I control for?Confounders that affect both treatment and outcome
What should I NOT control for?Mediators or colliders that introduce bias
What could be an instrument?Variables affecting treatment but not outcome directly

Why This Matters for Growth Strategy

If you attribute success to the wrong tactic, you will double down on the wrong thing. If you under measure a channel because it overlaps with other activity, you might cut your best performing asset.

Causal inference helps you:

  • Justify budget and resourcing
  • Design smarter interventions
  • Forecast more accurately
  • Avoid false positives

It is not easy. But it is worth it. I use these techniques not to impress analysts, but to help businesses make better bets.

Final Thought: Move Beyond Correlation

If your analytics are showing big numbers, but you are not sure why they are moving, it may be time to move beyond correlation.

I can help you do that, one counterfactual at a time.

Not sure if your marketing is actually working? I can help you move beyond correlation and measure true causal effect using propensity scores, instrumental variables and proper experimental design. Let's find out what is really driving your results.