Causal Inference in Marketing: Moving Beyond Correlation with Propensity Scores and Instrumental Variables
Most marketing analysis relies on correlation, not causation. A campaign runs, sales increase, and we assume the campaign worked. But often, that is not true. The timing overlaps, but the cause may be something else entirely. When A/B testing is not possible, I use causal inference techniques like propensity score matching and instrumental variables to estimate what would have happened if we had done something differently. In this article, I explain how to separate signal from noise and measure true effect, not just association.
Introduction: Why Correlation Is Not Enough
One of the most common mistakes I see in marketing analytics is the confusion of correlation with causation. A campaign runs. Sales increase. The natural assumption is that the campaign caused the growth. But often, that is not true. The timing overlaps, but the cause may be something else entirely. And if you optimise based on false assumptions, you risk wasting budget, misallocating credit, and building strategy on sand.
When A/B testing is possible, this problem becomes manageable. But many business questions cannot be tested experimentally, for legal, operational or ethical reasons. This is where causal inference comes in. Causal inference gives us mathematical and statistical tools to estimate what would have happened if we had done something differently. It allows us to separate signal from noise and measure true effect, not just association.
This article walks through how I use techniques like propensity score matching, instrumental variables, and graphical models (DAGs) to estimate causal impact in real marketing work.
The Causal Problem: What We Want to Know
Suppose I run a loyalty email campaign to a group of existing customers. At the end of the week, this group spends more than the group that did not receive the email. But was it the email that made the difference?
Maybe the more active customers were more likely to receive the email in the first place. Maybe they were already planning to buy. Maybe the email nudged them, or maybe it did nothing. The difference in spend could be due to underlying differences in customer type.
This is the fundamental question of potential outcomes. What we want to estimate is:
Where:
- is the outcome if treated (e.g. received email)
- is the outcome if not treated
- denotes expected value
We can observe one of these for each individual, but never both. That is the fundamental problem of causal inference. We must estimate the counterfactual.
Rubin Causal Model: The Formal Framework
The Rubin Causal Model provides a formal way to think about this. Each unit (e.g. customer) has two potential outcomes: one under treatment, one under control. The treatment effect is the difference, but we only see one outcome per unit.
If treatment is random, the average treatment effect (ATE) is easy to estimate. But in marketing, treatment is rarely random. Customers self select, or marketers target based on behaviour. So we must account for this non random assignment.
Propensity Score Matching: Balancing the Covariates
One solution is to model selection probability using observable characteristics. This is known as the propensity score.
We define:
Where:
- is the treatment assignment (1 if treated, 0 if control)
- is the vector of observed covariates (e.g. frequency, spend, geography)
- is the estimated probability of being treated
I estimate this using logistic regression or machine learning models. Then I match treated and untreated units with similar propensity scores. This balances the distribution of covariates across groups.
I usually check the balance before and after matching by looking at standardised mean differences. A well matched sample should resemble a randomised trial.
The treatment effect is then estimated as:
Where and are outcomes of matched pairs.
Practical Example: eCommerce Email Campaign
For a DTC fashion brand, I needed to measure the true effect of a personalisation email campaign. The raw data showed a 22% lift in conversion for email recipients. But recipients were pre selected based on past engagement.
| Group | Raw Conversion | After PSM Matching |
|---|---|---|
| Received Email | 8.4% | 6.1% |
| No Email | 5.2% | 5.3% |
| Apparent Lift | +61% | +15% |
After propensity score matching, the true lift was closer to 15%. Still significant, but the raw comparison massively overstated the effect.
Instrumental Variables: When Selection Bias Is Hidden
Sometimes, observable covariates are not enough. What if the reason for being treated is based on something unobservable (like motivation or intent)? This breaks the matching approach. In that case, I look for an instrumental variable (IV).
An IV is a variable that affects treatment assignment but has no direct effect on the outcome , except through .
For example, suppose some customers get a promotion email only because their location has a different email sending time. Location becomes the instrument: it predicts treatment but should not affect purchase directly.
I use two stage least squares (2SLS):
- Regress on : estimate predicted treatment
- Regress on : estimate effect of treatment using only the variation caused by
This isolates the exogenous part of treatment, the part not driven by selection bias.
Practical Example: SaaS Onboarding
For a SaaS client, users who completed onboarding had higher retention. But was it because onboarding worked, or because motivated users completed it?
I used server queue timing as an instrument. Some users received onboarding prompts earlier due to infrastructure timing, unrelated to their intent. This revealed that the true causal lift from onboarding was about half what the raw comparison suggested.
DAGs and Graphical Thinking
To make sense of these relationships, I use Directed Acyclic Graphs (DAGs). These are visual maps of how variables relate.
- Arrows represent causal relationships
- Nodes represent variables
- No cycles allowed
I use DAGs to decide what to control for, what to instrument, and where bias may creep in. I use software like dagitty.net to explore conditional independencies.
The Three Key Questions DAGs Answer
| Question | What It Reveals |
|---|---|
| What should I control for? | Confounders that affect both treatment and outcome |
| What should I NOT control for? | Mediators or colliders that introduce bias |
| What could be an instrument? | Variables affecting treatment but not outcome directly |
Why This Matters for Growth Strategy
If you attribute success to the wrong tactic, you will double down on the wrong thing. If you under measure a channel because it overlaps with other activity, you might cut your best performing asset.
Causal inference helps you:
- Justify budget and resourcing
- Design smarter interventions
- Forecast more accurately
- Avoid false positives
It is not easy. But it is worth it. I use these techniques not to impress analysts, but to help businesses make better bets.
Final Thought: Move Beyond Correlation
If your analytics are showing big numbers, but you are not sure why they are moving, it may be time to move beyond correlation.
I can help you do that, one counterfactual at a time.