The Math Behind It All: Growth Marketing Mathematics Explained

Psychology gets you interested. Data gets you funded. Every principle in this series ultimately needs to be measured, modelled, and proven. That is where the math comes in. This post is the final chapter: the quantitative foundations that separate gut-feel marketers from data-driven growth practitioners. I have written it for two audiences simultaneously. If you are new to this, each concept includes plain-English explanations, simple examples, and intuition-building analogies. If you are already comfortable with statistics, you will find the actual formulas, the edge cases, and the Python implementations you can drop into production. The math here is not academic theory. It is the working toolkit I use every day to make decisions about acquisition, retention, pricing, and experimentation.

LTV:CAC Ratio: The Unit Economics Foundation

The Plain English Version

Imagine you run a lemonade stand. You spend £10 on flyers to attract a customer. That customer buys lemonade from you over the next year totalling £30 in profit. Your LTV:CAC ratio is 3:1. You spent £10 to get £30. Good deal.

Now imagine you spend £10 on flyers, but the customer only ever buys £8 worth of lemonade. Your LTV:CAC ratio is 0.8:1. You are literally paying money to lose money. Every customer you acquire makes you poorer.

LTV (Lifetime Value): The total profit a customer generates over their entire relationship with you.

CAC (Customer Acquisition Cost): What you spent to acquire that customer.

LTV:CAC Ratio: How many pounds of value you get back for every pound spent acquiring a customer.

The Formula

The basic LTV formula is:

$LTV = ARPU \times Gross\ Margin \times Average\ Customer\ Lifespan$

Where:

ARPU = Average Revenue Per User (monthly or annual)
Gross Margin = Revenue minus cost of goods sold, as a percentage
Average Customer Lifespan = How long customers stay (in the same time unit as ARPU)

For subscription businesses with monthly churn, there is a more elegant formula:

$LTV = \frac{ARPU \times Gross\ Margin}{Churn\ Rate}$

This works because if your monthly churn rate is 5%, your average customer lifespan is 1/0.05 = 20 months.

The ratio itself is simply:

$LTV:CAC\ Ratio = \frac{LTV}{CAC}$

The Benchmarks

Ratio	Interpretation
< 1:1	You are losing money on every customer. Stop acquiring.
1:1 to 2:1	Marginal. You might break even. High risk.
3:1	The minimum healthy ratio. Industry standard target.
5:1+	Strong unit economics. Consider investing more in acquisition.
10:1+	Either underinvesting in growth, or your CAC measurement is wrong.

The Python Implementation

import numpy as np
import pandas as pd

def calculate_ltv_simple(arpu_monthly, gross_margin, churn_rate_monthly):
    """
    Calculate LTV using the standard subscription formula.
    
    Parameters:
    -----------
    arpu_monthly : float
        Average Revenue Per User per month
    gross_margin : float
        Gross margin as decimal (0.7 = 70%)
    churn_rate_monthly : float
        Monthly churn rate as decimal (0.05 = 5%)
    
    Returns:
    --------
    float : Customer Lifetime Value
    """
    if churn_rate_monthly <= 0:
        raise ValueError("Churn rate must be positive")
    
    avg_lifespan_months = 1 / churn_rate_monthly
    ltv = arpu_monthly * gross_margin * avg_lifespan_months
    
    return ltv

def calculate_ltv_cac_ratio(ltv, cac):
    """
    Calculate LTV:CAC ratio with interpretation.
    """
    if cac <= 0:
        raise ValueError("CAC must be positive")
    
    ratio = ltv / cac
    
    if ratio < 1:
        interpretation = "CRITICAL: Losing money on every customer"
    elif ratio < 2:
        interpretation = "WARNING: Marginal unit economics"
    elif ratio < 3:
        interpretation = "CAUTION: Below target, needs improvement"
    elif ratio < 5:
        interpretation = "HEALTHY: Good unit economics"
    else:
        interpretation = "STRONG: Consider increasing acquisition spend"
    
    return ratio, interpretation

# Example: SaaS business
print("LTV:CAC Analysis: Example SaaS Business")
print("=" * 60)

arpu = 99  # £99/month
gross_margin = 0.80  # 80% margin (software)
monthly_churn = 0.03  # 3% monthly churn
cac = 400  # £400 to acquire a customer

ltv = calculate_ltv_simple(arpu, gross_margin, monthly_churn)
ratio, interpretation = calculate_ltv_cac_ratio(ltv, cac)

print(f"\nInputs:")
print(f"  ARPU: £{arpu}/month")
print(f"  Gross Margin: {gross_margin*100:.0f}%")
print(f"  Monthly Churn: {monthly_churn*100:.1f}%")
print(f"  CAC: £{cac}")

print(f"\nCalculations:")
print(f"  Average Lifespan: {1/monthly_churn:.1f} months")
print(f"  LTV: £{ltv:,.2f}")
print(f"  LTV:CAC Ratio: {ratio:.1f}:1")
print(f"  {interpretation}")

# Sensitivity analysis
print("\n" + "=" * 60)
print("Sensitivity Analysis: Impact of Churn on LTV:CAC")
print("-" * 60)

print(f"{'Churn Rate':15} | {'Avg Lifespan':15} | {'LTV':12} | {'Ratio':10}")
print("-" * 60)

for churn in [0.01, 0.02, 0.03, 0.05, 0.08, 0.10]:
    ltv_test = calculate_ltv_simple(arpu, gross_margin, churn)
    ratio_test = ltv_test / cac
    print(f"{churn*100:13.1f}% | {1/churn:13.1f} mo | £{ltv_test:10,.0f} | {ratio_test:8.1f}:1")

print("\nKey Insight: A 1% improvement in churn dramatically improves unit economics")

Advanced: Cohorted LTV

The simple formula assumes all customers are the same. They are not. Advanced practitioners calculate LTV by cohort:

import pandas as pd
import numpy as np

def calculate_cohorted_ltv(cohort_data):
    """
    Calculate LTV by acquisition cohort.
    
    cohort_data should have columns:
    - cohort_month: when the customer was acquired
    - customer_id: unique identifier
    - total_revenue: cumulative revenue from this customer
    - acquisition_cost: what was spent to acquire them
    """
    
    cohort_summary = cohort_data.groupby('cohort_month').agg({
        'customer_id': 'count',
        'total_revenue': 'sum',
        'acquisition_cost': 'sum'
    }).rename(columns={'customer_id': 'customers'})
    
    cohort_summary['avg_ltv'] = cohort_summary['total_revenue'] / cohort_summary['customers']
    cohort_summary['avg_cac'] = cohort_summary['acquisition_cost'] / cohort_summary['customers']
    cohort_summary['ltv_cac_ratio'] = cohort_summary['avg_ltv'] / cohort_summary['avg_cac']
    
    return cohort_summary

# Simulate cohort data
np.random.seed(42)

cohorts = []
for month in pd.date_range('2025-01', '2025-12', freq='MS'):
    n_customers = np.random.randint(80, 150)
    
    # LTV varies by cohort (maybe channel mix changed, or product improved)
    base_ltv = 800 + (month.month - 1) * 50  # Improving over time
    ltv_variation = np.random.normal(base_ltv, 200, n_customers)
    
    # CAC also varies
    base_cac = 350 - (month.month - 1) * 10  # Getting more efficient
    cac_variation = np.random.normal(base_cac, 50, n_customers)
    
    for i in range(n_customers):
        cohorts.append({
            'cohort_month': month.strftime('%Y-%m'),
            'customer_id': f"{month.strftime('%Y%m')}_{i}",
            'total_revenue': max(0, ltv_variation[i]),
            'acquisition_cost': max(50, cac_variation[i])
        })

cohort_df = pd.DataFrame(cohorts)
results = calculate_cohorted_ltv(cohort_df)

print("\nCohorted LTV:CAC Analysis")
print("=" * 70)
print(results[['customers', 'avg_ltv', 'avg_cac', 'ltv_cac_ratio']].round(2).to_string())

print("\nKey Insight: Cohorted analysis reveals trends invisible in aggregate")
print("This example shows improving unit economics over time (product/channel improvements)")

Payback Period: When Do You Get Your Money Back?

The Plain English Version

LTV:CAC tells you if the investment is worth it. Payback period tells you how long you have to wait to see a return.

Imagine you spend £1,000 to acquire a customer who pays you £100/month. Your payback period is 10 months. For 10 months, that customer is underwater (you have not yet recouped your investment). After month 10, they become profitable.

Why does this matter? Cash flow. If your payback period is 24 months, you need to fund 24 months of acquisition costs before seeing returns. That requires capital. A 6 month payback period means your acquisition spend recycles much faster.

The Formula

$Payback\ Period = \frac{CAC}{ARPU \times Gross\ Margin}$

For a business with £500 CAC, £100/month ARPU, and 80% gross margin:

$Payback\ Period = \frac{500}{100 \times 0.80} = \frac{500}{80} = 6.25\ months$

The Benchmarks

Business Type	Target Payback Period
E-commerce	< 6 months (ideally first purchase)
SaaS (SMB)	< 12 months
SaaS (Enterprise)	< 18 months
Marketplace	< 6 months per side
Consumer subscription	< 6 months

The Python Implementation

import numpy as np
import pandas as pd

def calculate_payback_period(cac, arpu_monthly, gross_margin):
    """
    Calculate months until CAC is recovered.
    """
    monthly_contribution = arpu_monthly * gross_margin
    
    if monthly_contribution <= 0:
        return float('inf')
    
    payback_months = cac / monthly_contribution
    
    return payback_months

def payback_analysis(cac, arpu_monthly, gross_margin, churn_rate_monthly):
    """
    Full payback analysis including survival probability.
    
    The key insight: some customers churn before payback.
    We need to account for this.
    """
    
    simple_payback = calculate_payback_period(cac, arpu_monthly, gross_margin)
    
    # Probability of surviving to payback
    # Assuming constant churn rate (exponential survival)
    survival_prob = (1 - churn_rate_monthly) ** simple_payback
    
    # Expected payback considering churn
    # This is more complex - need to integrate over time
    monthly_contribution = arpu_monthly * gross_margin
    
    # Simulate expected cumulative contribution
    months = np.arange(1, 61)  # 5 years
    survival = (1 - churn_rate_monthly) ** months
    cumulative_contribution = np.cumsum(monthly_contribution * survival)
    
    # Find when cumulative contribution exceeds CAC
    if cumulative_contribution[-1] < cac:
        expected_payback = float('inf')
    else:
        expected_payback = months[cumulative_contribution >= cac][0]
    
    return {
        'simple_payback': simple_payback,
        'survival_to_payback': survival_prob,
        'expected_payback': expected_payback
    }

print("Payback Period Analysis")
print("=" * 60)

cac = 500
arpu = 80
gross_margin = 0.75
churn = 0.04

results = payback_analysis(cac, arpu, gross_margin, churn)

print(f"\nInputs:")
print(f"  CAC: £{cac}")
print(f"  ARPU: £{arpu}/month")
print(f"  Gross Margin: {gross_margin*100:.0f}%")
print(f"  Monthly Churn: {churn*100:.0f}%")

print(f"\nResults:")
print(f"  Simple Payback: {results['simple_payback']:.1f} months")
print(f"  Probability of surviving to payback: {results['survival_to_payback']*100:.1f}%")
print(f"  Expected Payback (accounting for churn): {results['expected_payback']} months")

print("\nInterpretation:")
if results['simple_payback'] < 12:
    print("  ✓ Payback under 12 months: healthy for SaaS")
else:
    print("  ✗ Payback over 12 months: consider improving unit economics")

if results['survival_to_payback'] < 0.7:
    print("  ⚠ Warning: >30% of customers churn before payback")
    print("    This means a significant portion of acquisition spend is unrecovered")

Cohort Analysis: Stop Averaging Everything

The Plain English Version

Imagine you have 1,000 customers. Your average monthly revenue per customer is £50. Great, right?

But what if customers acquired in January average £80/month, while customers acquired in June average £20/month? Your "average" hides a critical problem: your acquisition quality is declining. Or your product is getting worse for new users. Or you changed channels and the new channel brings worse customers.

Cohort analysis groups users by when they were acquired (or by some other meaningful characteristic) and tracks them over time. Patterns that are invisible at the aggregate level become obvious.

The Key Cohort Metrics

Retention curves: What percentage of each cohort is still active at month 1, 2, 3, etc.?

Revenue curves: What is the cumulative revenue per customer in each cohort over time?

Behavioural curves: What percentage of each cohort has completed key actions (first purchase, feature adoption, referral)?

The Python Implementation

import numpy as np
import pandas as pd

def create_cohort_retention_table(df, user_col, date_col, cohort_col):
    """
    Create a cohort retention table from transaction/activity data.
    
    Parameters:
    -----------
    df : DataFrame with user activity
    user_col : column name for user ID
    date_col : column name for activity date
    cohort_col : column name for cohort assignment (acquisition date)
    
    Returns:
    --------
    DataFrame with cohort retention percentages
    """
    
    # Get cohort month for each user (first activity)
    df['activity_month'] = pd.to_datetime(df[date_col]).dt.to_period('M')
    df['cohort_month'] = pd.to_datetime(df[cohort_col]).dt.to_period('M')
    
    # Calculate months since acquisition
    df['months_since_acquisition'] = (df['activity_month'] - df['cohort_month']).apply(lambda x: x.n if pd.notna(x) else 0)
    
    # Create cohort table
    cohort_data = df.groupby(['cohort_month', 'months_since_acquisition'])[user_col].nunique().reset_index()
    cohort_data.columns = ['cohort_month', 'months_since', 'users']
    
    # Pivot to get cohort retention matrix
    cohort_pivot = cohort_data.pivot(index='cohort_month', columns='months_since', values='users')
    
    # Convert to percentages (relative to month 0)
    cohort_sizes = cohort_pivot[0]
    cohort_retention = cohort_pivot.div(cohort_sizes, axis=0) * 100
    
    return cohort_retention, cohort_sizes

# Simulate cohort data
np.random.seed(42)

data = []
for cohort_month in pd.date_range('2025-01', '2025-06', freq='MS'):
    # Each cohort starts with 100 users
    n_users = 100
    
    # Simulate different retention curves for different cohorts
    # Later cohorts have worse retention (simulating a problem)
    base_retention = 0.85 - (cohort_month.month - 1) * 0.03
    
    for user_id in range(n_users):
        user_unique_id = f"{cohort_month.strftime('%Y%m')}_{user_id}"
        
        # User's activity months (simulate churn)
        active = True
        current_month = cohort_month
        
        while active and current_month < pd.Timestamp('2025-07-01'):
            data.append({
                'user_id': user_unique_id,
                'activity_date': current_month,
                'cohort_date': cohort_month
            })
            
            current_month = current_month + pd.DateOffset(months=1)
            active = np.random.random() < base_retention

df = pd.DataFrame(data)
retention_table, cohort_sizes = create_cohort_retention_table(df, 'user_id', 'activity_date', 'cohort_date')

print("Cohort Retention Analysis")
print("=" * 70)
print("\nRetention Rate by Cohort and Month (%)")
print(retention_table.round(1).to_string())

print("\n" + "=" * 70)
print("Key Insights:")
print("  • Look DOWN columns: how does Month 3 retention compare across cohorts?")
print("  • Look ACROSS rows: how does retention decay for a single cohort?")
print("  • In this example: later cohorts have worse retention (a warning sign!)")

# Visualise the insight
print("\nMonth 3 Retention by Cohort:")
if 3 in retention_table.columns:
    for cohort in retention_table.index:
        val = retention_table.loc[cohort, 3]
        if pd.notna(val):
            bar = '█' * int(val / 5)
            print(f"  {cohort}: {val:5.1f}% {bar}")

Cohort Analysis for Revenue

import numpy as np
import pandas as pd

def create_revenue_cohort_table(df, user_col, date_col, revenue_col, cohort_col):
    """
    Create cumulative revenue per customer by cohort.
    """
    
    df['activity_month'] = pd.to_datetime(df[date_col]).dt.to_period('M')
    df['cohort_month'] = pd.to_datetime(df[cohort_col]).dt.to_period('M')
    df['months_since'] = (df['activity_month'] - df['cohort_month']).apply(lambda x: x.n if pd.notna(x) else 0)
    
    # Sum revenue by cohort and month
    revenue_data = df.groupby(['cohort_month', 'months_since'])[revenue_col].sum().reset_index()
    
    # Get cohort sizes
    cohort_sizes = df.groupby('cohort_month')[user_col].nunique()
    
    # Calculate per-customer revenue
    revenue_data = revenue_data.merge(
        cohort_sizes.reset_index().rename(columns={user_col: 'cohort_size'}),
        on='cohort_month'
    )
    revenue_data['revenue_per_customer'] = revenue_data[revenue_col] / revenue_data['cohort_size']
    
    # Pivot and calculate cumulative
    revenue_pivot = revenue_data.pivot(index='cohort_month', columns='months_since', values='revenue_per_customer')
    cumulative_revenue = revenue_pivot.cumsum(axis=1)
    
    return cumulative_revenue

print("\nCohort Revenue Analysis:")
print("Cumulative revenue per customer tells you how fast different cohorts monetise")
print("Declining curves = worsening customer quality or product-market fit issues")

Survival Analysis: A Better Way to Model Churn

The Plain English Version

Traditional churn metrics ("5% of customers churned last month") have a problem: they treat all customers the same regardless of how long they have been with you.

But a customer who has been with you for 3 years churns differently than a customer who joined last week. Survival analysis, borrowed from biostatistics (where it is used to model patient survival), accounts for this.

The key insight: survival analysis asks "given that a customer has survived until time T, what is the probability they will survive to time T+1?"

The Key Concepts

Survival function S(t): The probability of surviving (not churning) until time t.

$S(t) = P(T > t)$

Hazard function h(t): The instantaneous risk of churning at time t, given survival until t.

$h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t | T \geq t)}{\Delta t}$

In plain English: the hazard rate is the "churn risk" at any given moment, accounting for how long the customer has already survived.

Kaplan-Meier estimator: A non-parametric way to estimate the survival function from data.

$\hat{S}(t) = \prod_{t_i \leq t} \left(1 - \frac{d_i}{n_i}\right)$

Where:

$d_i$ = number of churns at time $t_i$
$n_i$ = number of customers still at risk at time $t_i$

Why This Matters for Marketing

Better retention predictions: Survival curves show the actual shape of churn, not just a single number.
Handles censoring: What about customers who are still active? Traditional churn rates cannot handle them properly. Survival analysis does (right-censoring).
Compares cohorts properly: Is January's cohort really worse than February's? Survival curves give you a statistically valid comparison.

The Python Implementation

import numpy as np
import pandas as pd
from collections import defaultdict

def kaplan_meier_estimator(durations, observed):
    """
    Calculate Kaplan-Meier survival estimates.
    
    Parameters:
    -----------
    durations : array-like
        Time to event (churn) or censoring for each customer
    observed : array-like
        1 if the customer churned, 0 if still active (censored)
    
    Returns:
    --------
    DataFrame with survival estimates at each time point
    """
    
    # Combine and sort by duration
    data = pd.DataFrame({
        'duration': durations,
        'observed': observed
    }).sort_values('duration')
    
    # Get unique event times (only where churns happened)
    event_times = sorted(data[data['observed'] == 1]['duration'].unique())
    
    results = []
    n_at_risk = len(data)
    survival_prob = 1.0
    
    for t in event_times:
        # Number who churned at time t
        d_t = len(data[(data['duration'] == t) & (data['observed'] == 1)])
        
        # Number at risk at time t (still active or churning at t)
        n_t = len(data[data['duration'] >= t])
        
        # Kaplan-Meier estimate
        if n_t > 0:
            survival_prob = survival_prob * (1 - d_t / n_t)
        
        results.append({
            'time': t,
            'at_risk': n_t,
            'churned': d_t,
            'survival_prob': survival_prob
        })
    
    return pd.DataFrame(results)

def calculate_median_survival(km_results):
    """
    Calculate median survival time (when 50% have churned).
    """
    below_50 = km_results[km_results['survival_prob'] <= 0.5]
    
    if len(below_50) > 0:
        return below_50.iloc[0]['time']
    else:
        return None  # Median not yet reached

# Simulate customer data
np.random.seed(42)

n_customers = 500

# Generate durations (time to churn or current tenure)
# Using Weibull distribution (common in survival analysis)
# Shape < 1: decreasing hazard (early churn risk, then stabilises)
# Shape = 1: constant hazard (exponential)
# Shape > 1: increasing hazard (wear-out)

shape = 0.8  # Decreasing hazard (high early churn, then survivors stick around)
scale = 24   # Scale parameter

durations = np.random.weibull(shape, n_customers) * scale
durations = np.clip(durations, 1, 36)  # Cap at 36 months

# Some customers are still active (censored)
# Assume we're observing at month 18
observation_time = 18
observed = (durations <= observation_time).astype(int)
durations = np.minimum(durations, observation_time)

# Calculate Kaplan-Meier
km_results = kaplan_meier_estimator(durations, observed)

print("Survival Analysis: Kaplan-Meier Estimation")
print("=" * 70)
print("\nSurvival Curve (Probability of Still Being a Customer)")
print("-" * 70)
print(f"{'Month':8} | {'At Risk':10} | {'Churned':10} | {'Survival Prob':15} | {'Visual':20}")
print("-" * 70)

for _, row in km_results.iterrows():
    bar_length = int(row['survival_prob'] * 20)
    bar = '█' * bar_length + '░' * (20 - bar_length)
    print(f"{row['time']:6.0f} | {row['at_risk']:8.0f} | {row['churned']:8.0f} | {row['survival_prob']*100:12.1f}% | {bar}")

median = calculate_median_survival(km_results)
print(f"\nMedian Survival Time: {median:.1f} months" if median else "\nMedian not reached")

print("\n" + "=" * 70)
print("Interpreting the Survival Curve:")
print("  • The curve drops steeply early = high early churn")
print("  • The curve flattens later = survivors become loyal")
print("  • Compare curves across cohorts to spot problems")
print("  • Use median survival as a single summary metric")

Comparing Cohorts with Log-Rank Test

import numpy as np
from scipy import stats

def log_rank_test(durations1, observed1, durations2, observed2):
    """
    Compare two survival curves using the log-rank test.
    
    Returns chi-square statistic and p-value.
    """
    
    # Combine all unique event times
    all_times = sorted(set(np.concatenate([
        durations1[observed1 == 1],
        durations2[observed2 == 1]
    ])))
    
    observed_sum = 0
    expected_sum = 0
    variance_sum = 0
    
    for t in all_times:
        # Group 1 at time t
        n1 = np.sum(durations1 >= t)
        d1 = np.sum((durations1 == t) & (observed1 == 1))
        
        # Group 2 at time t
        n2 = np.sum(durations2 >= t)
        d2 = np.sum((durations2 == t) & (observed2 == 1))
        
        # Combined
        n = n1 + n2
        d = d1 + d2
        
        if n > 0:
            # Expected events in group 1
            e1 = n1 * d / n
            
            # Variance
            if n > 1:
                v = (n1 * n2 * d * (n - d)) / (n * n * (n - 1))
            else:
                v = 0
            
            observed_sum += d1
            expected_sum += e1
            variance_sum += v
    
    # Chi-square statistic
    if variance_sum > 0:
        chi2 = (observed_sum - expected_sum) ** 2 / variance_sum
        p_value = 1 - stats.chi2.cdf(chi2, df=1)
    else:
        chi2 = 0
        p_value = 1.0
    
    return chi2, p_value

# Simulate two cohorts with different survival characteristics
np.random.seed(42)

# Cohort A: Good retention (higher scale parameter)
durations_a = np.random.weibull(0.8, 200) * 28
durations_a = np.clip(durations_a, 1, 18)
observed_a = (durations_a < 18).astype(int)

# Cohort B: Worse retention (lower scale parameter)
durations_b = np.random.weibull(0.8, 200) * 20
durations_b = np.clip(durations_b, 1, 18)
observed_b = (durations_b < 18).astype(int)

chi2, p_value = log_rank_test(durations_a, observed_a, durations_b, observed_b)

print("\nLog-Rank Test: Comparing Two Cohorts")
print("=" * 60)
print(f"Chi-square statistic: {chi2:.2f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Result: Statistically significant difference in survival curves")
else:
    print("Result: No significant difference detected")

Bayesian A/B Testing: Probability, Not P-Values

The Plain English Version

Traditional (frequentist) A/B testing asks: "If there's really no difference between A and B, how likely is it that I'd see data this extreme?"

Bayesian A/B testing asks a more intuitive question: "Given the data I've seen, what's the probability that B is better than A?"

For a marketer, the Bayesian question is far more useful. You want to know: "If I ship variant B, what's the probability I'm making the right call?"

The Key Concepts

Prior distribution: What you believe before seeing data. Often set to "uninformative" (no strong prior belief).

Likelihood: The probability of seeing your data given a particular parameter value.

Posterior distribution: Your updated belief after seeing data. Combines prior and likelihood via Bayes' theorem.

$P(\theta | data) = \frac{P(data | \theta) \times P(\theta)}{P(data)}$

In words: Posterior = (Likelihood × Prior) / Evidence

Probability of B > A: We can directly calculate this from the posterior distributions.

Why Bayesian Is Better for Low-Traffic Sites

No predetermined sample size required. Frequentist tests need you to calculate sample size upfront and wait. Bayesian lets you peek anytime.
Intuitive interpretation. "87% probability B is better" is easier to act on than "p = 0.034".
Decision-focused. You can calculate expected loss from choosing wrong variant.
Handles early stopping properly. Looking at frequentist tests early inflates false positive rates. Bayesian handles this naturally.

The Python Implementation

import numpy as np
from scipy import stats

def bayesian_ab_test(visitors_a, conversions_a, visitors_b, conversions_b, 
                     prior_alpha=1, prior_beta=1, n_simulations=100000):
    """
    Bayesian A/B test for conversion rates.
    
    Uses Beta distribution as conjugate prior for binomial likelihood.
    Beta(1,1) is uniform prior (uninformative).
    
    Parameters:
    -----------
    visitors_a, conversions_a : Control group data
    visitors_b, conversions_b : Treatment group data
    prior_alpha, prior_beta : Beta prior parameters
    n_simulations : Number of Monte Carlo samples
    
    Returns:
    --------
    dict with probability B > A, expected uplift, and credible intervals
    """
    
    # Posterior parameters (Beta is conjugate prior for binomial)
    # Posterior = Beta(alpha + successes, beta + failures)
    
    alpha_a = prior_alpha + conversions_a
    beta_a = prior_beta + (visitors_a - conversions_a)
    
    alpha_b = prior_alpha + conversions_b
    beta_b = prior_beta + (visitors_b - conversions_b)
    
    # Sample from posteriors
    samples_a = np.random.beta(alpha_a, beta_a, n_simulations)
    samples_b = np.random.beta(alpha_b, beta_b, n_simulations)
    
    # Probability B > A
    prob_b_better = np.mean(samples_b > samples_a)
    
    # Expected uplift (relative improvement)
    uplift_samples = (samples_b - samples_a) / samples_a
    expected_uplift = np.mean(uplift_samples)
    
    # Credible intervals (95%)
    ci_a = np.percentile(samples_a, [2.5, 97.5])
    ci_b = np.percentile(samples_b, [2.5, 97.5])
    ci_uplift = np.percentile(uplift_samples, [2.5, 97.5])
    
    # Risk: Expected loss if we choose B but A is actually better
    loss_if_choose_b = np.mean(np.maximum(samples_a - samples_b, 0))
    loss_if_choose_a = np.mean(np.maximum(samples_b - samples_a, 0))
    
    return {
        'conversion_rate_a': conversions_a / visitors_a,
        'conversion_rate_b': conversions_b / visitors_b,
        'prob_b_better': prob_b_better,
        'expected_uplift': expected_uplift,
        'ci_a': ci_a,
        'ci_b': ci_b,
        'ci_uplift': ci_uplift,
        'expected_loss_if_choose_b': loss_if_choose_b,
        'expected_loss_if_choose_a': loss_if_choose_a
    }

def interpret_bayesian_results(results, risk_threshold=0.001):
    """
    Provide decision recommendation based on Bayesian results.
    """
    
    prob = results['prob_b_better']
    
    if prob >= 0.95:
        decision = "SHIP B: High confidence B is better"
    elif prob <= 0.05:
        decision = "KEEP A: High confidence A is better"
    elif results['expected_loss_if_choose_b'] < risk_threshold:
        decision = "SHIP B: Expected loss is acceptable"
    elif results['expected_loss_if_choose_a'] < risk_threshold:
        decision = "KEEP A: Expected loss is acceptable"
    else:
        decision = "CONTINUE TEST: Not enough data for confident decision"
    
    return decision

# Example: Low-traffic A/B test
print("Bayesian A/B Testing")
print("=" * 70)

# Scenario: Small e-commerce site testing checkout button color
visitors_a = 1200
conversions_a = 48  # 4% conversion

visitors_b = 1180
conversions_b = 59  # 5% conversion

results = bayesian_ab_test(visitors_a, conversions_a, visitors_b, conversions_b)

print(f"\nTest Data:")
print(f"  Control (A): {conversions_a}/{visitors_a} = {results['conversion_rate_a']*100:.2f}%")
print(f"  Variant (B): {conversions_b}/{visitors_b} = {results['conversion_rate_b']*100:.2f}%")

print(f"\nBayesian Results:")
print(f"  Probability B > A: {results['prob_b_better']*100:.1f}%")
print(f"  Expected uplift: {results['expected_uplift']*100:+.1f}%")
print(f"  95% CI for uplift: [{results['ci_uplift'][0]*100:.1f}%, {results['ci_uplift'][1]*100:.1f}%]")

print(f"\nRisk Analysis:")
print(f"  Expected loss if choose B (and A is better): {results['expected_loss_if_choose_b']*100:.3f}%")
print(f"  Expected loss if choose A (and B is better): {results['expected_loss_if_choose_a']*100:.3f}%")

decision = interpret_bayesian_results(results)
print(f"\nDecision: {decision}")

print("\n" + "=" * 70)
print("Comparison to Frequentist Approach:")
print("  Frequentist would calculate p-value and require predetermined sample size")
print("  Bayesian gives probability of being right + expected cost of being wrong")
print("  Bayesian allows checking anytime without inflating false positives")

When to Stop a Bayesian Test

def bayesian_stopping_rule(prob_b_better, expected_loss_b, expected_loss_a,
                           prob_threshold=0.95, loss_threshold=0.001):
    """
    Determine if we have enough data to make a decision.
    
    Two stopping rules:
    1. Probability threshold: P(B>A) >= threshold or <= (1-threshold)
    2. Expected loss threshold: Expected loss of decision < threshold
    """
    
    # Rule 1: High probability
    if prob_b_better >= prob_threshold:
        return True, "STOP: High confidence B is better", "Ship B"
    elif prob_b_better <= (1 - prob_threshold):
        return True, "STOP: High confidence A is better", "Keep A"
    
    # Rule 2: Acceptable loss
    if expected_loss_b < loss_threshold:
        return True, "STOP: Acceptable loss for B", "Ship B"
    elif expected_loss_a < loss_threshold:
        return True, "STOP: Acceptable loss for A", "Keep A"
    
    return False, "CONTINUE: Need more data", None

print("\nBayesian Stopping Rules:")
print("  1. Probability threshold: Stop when P(B>A) > 95% or < 5%")
print("  2. Expected loss threshold: Stop when expected loss < 0.1%")
print("  These can be tuned based on risk tolerance")

Multi-Armed Bandit: Adaptive Testing

The Plain English Version

A/B testing has a problem: opportunity cost. While you test, half your traffic goes to the worse variant. If the test runs for weeks, that is a lot of lost conversions.

Multi-armed bandits solve this by shifting traffic to winning variants during the test. As evidence accumulates that B is better, more traffic goes to B. You "exploit" what you are learning while still "exploring" to gather more data.

The name comes from slot machines ("one-armed bandits"). Imagine you have 3 slot machines with unknown payout rates. How do you maximise your winnings? You need to balance trying new machines (exploration) with playing the best one you have found so far (exploitation).

The Key Algorithm: Thompson Sampling

Thompson Sampling is the standard implementation for multi-armed bandits. It is elegantly simple:

For each variant, maintain a Beta distribution over its conversion rate.
To assign a user, sample from each distribution.
Assign the user to whichever variant had the highest sample.
Update the distributions based on outcome.

The beauty: variants with uncertain (but potentially high) conversion rates still get traffic to resolve the uncertainty. But variants with clearly lower rates quickly get less traffic.

The Math

For each variant $i$ , we maintain:

$\alpha_i$ = number of successes + 1
$\beta_i$ = number of failures + 1

The conversion rate follows: $\theta_i \sim Beta(\alpha_i, \beta_i)$

To select a variant for user $n$ :

Sample $\tilde{\theta}_i \sim Beta(\alpha_i, \beta_i)$ for each variant
Select variant $j = \arg\max_i \tilde{\theta}_i$

After observing outcome:

If conversion: $\alpha_j \leftarrow \alpha_j + 1$
If no conversion: $\beta_j \leftarrow \beta_j + 1$

The Python Implementation

import numpy as np

class ThompsonSamplingBandit:
    """
    Multi-armed bandit using Thompson Sampling.
    """
    
    def __init__(self, n_variants):
        """
        Initialize with uninformative Beta(1,1) priors.
        """
        self.n_variants = n_variants
        self.alpha = np.ones(n_variants)  # Successes + 1
        self.beta = np.ones(n_variants)   # Failures + 1
        
        self.total_visitors = np.zeros(n_variants)
        self.total_conversions = np.zeros(n_variants)
    
    def select_variant(self):
        """
        Select a variant for the next user using Thompson Sampling.
        """
        # Sample from each variant's Beta distribution
        samples = np.random.beta(self.alpha, self.beta)
        
        # Return variant with highest sample
        return np.argmax(samples)
    
    def update(self, variant, converted):
        """
        Update beliefs based on observed outcome.
        
        variant: which variant was shown
        converted: 1 if converted, 0 otherwise
        """
        self.total_visitors[variant] += 1
        
        if converted:
            self.alpha[variant] += 1
            self.total_conversions[variant] += 1
        else:
            self.beta[variant] += 1
    
    def get_stats(self):
        """
        Get current statistics for all variants.
        """
        stats = []
        
        for i in range(self.n_variants):
            # Expected conversion rate (mean of Beta distribution)
            expected_rate = self.alpha[i] / (self.alpha[i] + self.beta[i])
            
            # Observed conversion rate
            if self.total_visitors[i] > 0:
                observed_rate = self.total_conversions[i] / self.total_visitors[i]
            else:
                observed_rate = 0
            
            # Traffic share
            total_traffic = np.sum(self.total_visitors)
            traffic_share = self.total_visitors[i] / total_traffic if total_traffic > 0 else 0
            
            stats.append({
                'variant': i,
                'visitors': self.total_visitors[i],
                'conversions': self.total_conversions[i],
                'observed_rate': observed_rate,
                'expected_rate': expected_rate,
                'traffic_share': traffic_share
            })
        
        return stats

def simulate_bandit_vs_ab(true_rates, n_visitors):
    """
    Compare Thompson Sampling bandit vs traditional A/B test.
    
    true_rates: actual conversion rates for each variant
    n_visitors: total number of visitors
    """
    
    n_variants = len(true_rates)
    
    # Thompson Sampling bandit
    bandit = ThompsonSamplingBandit(n_variants)
    bandit_conversions = 0
    
    # A/B test (equal split)
    ab_visitors = np.zeros(n_variants)
    ab_conversions = np.zeros(n_variants)
    
    for visitor in range(n_visitors):
        # Bandit: adaptive assignment
        bandit_variant = bandit.select_variant()
        bandit_converted = np.random.random() < true_rates[bandit_variant]
        bandit.update(bandit_variant, int(bandit_converted))
        if bandit_converted:
            bandit_conversions += 1
        
        # A/B: equal assignment
        ab_variant = visitor % n_variants
        ab_converted = np.random.random() < true_rates[ab_variant]
        ab_visitors[ab_variant] += 1
        if ab_converted:
            ab_conversions[ab_variant] += 1
    
    total_ab_conversions = np.sum(ab_conversions)
    
    return {
        'bandit_conversions': bandit_conversions,
        'ab_conversions': total_ab_conversions,
        'bandit_stats': bandit.get_stats(),
        'ab_visitors': ab_visitors,
        'ab_conversions_by_variant': ab_conversions
    }

# Simulate
np.random.seed(42)

print("Multi-Armed Bandit: Thompson Sampling")
print("=" * 70)

# True conversion rates (unknown to the algorithm)
true_rates = [0.03, 0.05, 0.04]  # Variant B is best
n_visitors = 10000

results = simulate_bandit_vs_ab(true_rates, n_visitors)

print(f"\nTrue conversion rates: {[f'{r*100:.1f}%' for r in true_rates]}")
print(f"Best variant: {np.argmax(true_rates)} ({true_rates[np.argmax(true_rates)]*100:.1f}%)")

print(f"\nResults after {n_visitors:,} visitors:")
print("-" * 70)

print(f"\nThompson Sampling Bandit:")
print(f"  Total conversions: {results['bandit_conversions']}")
print(f"  Traffic allocation:")
for stat in results['bandit_stats']:
    print(f"    Variant {stat['variant']}: {stat['visitors']:.0f} visitors ({stat['traffic_share']*100:.1f}%), {stat['observed_rate']*100:.2f}% observed rate")

print(f"\nTraditional A/B Test (equal split):")
print(f"  Total conversions: {results['ab_conversions']:.0f}")
for i, (v, c) in enumerate(zip(results['ab_visitors'], results['ab_conversions_by_variant'])):
    rate = c/v if v > 0 else 0
    print(f"    Variant {i}: {v:.0f} visitors, {rate*100:.2f}% observed rate")

print(f"\nOpportunity Cost:")
print(f"  Bandit extra conversions: {results['bandit_conversions'] - results['ab_conversions']:.0f}")
print(f"  Improvement: {(results['bandit_conversions']/results['ab_conversions'] - 1)*100:.1f}%")

print("\nKey Insight: Bandit shifted traffic to the winning variant during the test")
print("This reduces opportunity cost while still learning")

Marketing Mix Modelling vs Attribution

The Plain English Version

Attribution asks: "Which touchpoint gets credit for this conversion?"

Marketing Mix Modelling (MMM) asks: "How much did each channel contribute to overall conversions?"

They are solving different problems.

Attribution is bottom-up: track individual users, credit touchpoints. MMM is top-down: model aggregate data, estimate channel contribution via regression.

With cookies dying and privacy regulations tightening, attribution is getting harder. MMM is having a renaissance because it does not require user-level tracking.

Attribution Models

import numpy as np
import pandas as pd

def apply_attribution_models(touchpoints, conversion_value):
    """
    Apply different attribution models to a customer journey.
    
    touchpoints: list of (channel, timestamp) tuples
    conversion_value: value of the conversion
    
    Returns dict of channel credits under each model.
    """
    
    channels = [t[0] for t in touchpoints]
    unique_channels = list(set(channels))
    n_touchpoints = len(touchpoints)
    
    # Last-touch attribution
    last_touch = {c: 0 for c in unique_channels}
    last_touch[channels[-1]] = conversion_value
    
    # First-touch attribution
    first_touch = {c: 0 for c in unique_channels}
    first_touch[channels[0]] = conversion_value
    
    # Linear attribution (equal credit)
    linear = {c: 0 for c in unique_channels}
    credit_per_touch = conversion_value / n_touchpoints
    for channel in channels:
        linear[channel] += credit_per_touch
    
    # Time-decay attribution (more recent = more credit)
    time_decay = {c: 0 for c in unique_channels}
    decay_factor = 0.5  # Half-life decay
    weights = [decay_factor ** (n_touchpoints - i - 1) for i in range(n_touchpoints)]
    total_weight = sum(weights)
    
    for i, channel in enumerate(channels):
        time_decay[channel] += conversion_value * weights[i] / total_weight
    
    # Position-based (40% first, 40% last, 20% middle)
    position = {c: 0 for c in unique_channels}
    if n_touchpoints == 1:
        position[channels[0]] = conversion_value
    elif n_touchpoints == 2:
        position[channels[0]] = conversion_value * 0.5
        position[channels[-1]] += conversion_value * 0.5
    else:
        position[channels[0]] = conversion_value * 0.4
        position[channels[-1]] += conversion_value * 0.4
        middle_credit = conversion_value * 0.2 / (n_touchpoints - 2)
        for channel in channels[1:-1]:
            position[channel] += middle_credit
    
    return {
        'last_touch': last_touch,
        'first_touch': first_touch,
        'linear': linear,
        'time_decay': time_decay,
        'position_based': position
    }

# Example customer journey
journey = [
    ('Paid Search', '2026-01-01 10:00'),
    ('Email', '2026-01-03 14:00'),
    ('Social', '2026-01-05 09:00'),
    ('Direct', '2026-01-07 16:00'),
    ('Retargeting', '2026-01-08 11:00'),  # Conversion
]

conversion_value = 100

print("Attribution Models Comparison")
print("=" * 70)
print(f"\nCustomer Journey: {' → '.join([t[0] for t in journey])}")
print(f"Conversion Value: £{conversion_value}")

attribution = apply_attribution_models(journey, conversion_value)

print(f"\n{'Model':20} | {'Paid Search':12} | {'Email':8} | {'Social':8} | {'Direct':8} | {'Retargeting':12}")
print("-" * 80)

for model_name, credits in attribution.items():
    row = f"{model_name:20} |"
    for channel in ['Paid Search', 'Email', 'Social', 'Direct', 'Retargeting']:
        row += f" £{credits.get(channel, 0):10.2f} |"
    print(row)

print("\nKey Insight: Same journey, wildly different channel credit")
print("This is why attribution model choice matters for budget allocation")

Marketing Mix Modelling

import numpy as np
import pandas as pd
from scipy import stats

def simple_marketing_mix_model(data):
    """
    Simplified Marketing Mix Model using linear regression.
    
    In practice, MMM includes:
    - Adstock (carryover effects)
    - Saturation curves (diminishing returns)
    - Seasonality controls
    - External factors (economy, weather, etc.)
    
    This is a simplified version for illustration.
    """
    
    # Prepare feature matrix
    X = data[['tv_spend', 'digital_spend', 'social_spend', 'print_spend']].values
    y = data['revenue'].values
    
    # Add intercept
    X_with_intercept = np.column_stack([np.ones(len(X)), X])
    
    # OLS regression
    # β = (X'X)^-1 X'y
    beta = np.linalg.lstsq(X_with_intercept, y, rcond=None)[0]
    
    # Predictions and R-squared
    y_pred = X_with_intercept @ beta
    ss_res = np.sum((y - y_pred) ** 2)
    ss_tot = np.sum((y - np.mean(y)) ** 2)
    r_squared = 1 - (ss_res / ss_tot)
    
    # Contribution (spend × coefficient)
    channels = ['TV', 'Digital', 'Social', 'Print']
    contributions = {}
    
    for i, channel in enumerate(channels):
        avg_spend = np.mean(X[:, i])
        contribution = beta[i+1] * avg_spend
        contributions[channel] = {
            'coefficient': beta[i+1],
            'avg_spend': avg_spend,
            'contribution': contribution,
            'roi': beta[i+1]  # Revenue per £1 spent
        }
    
    base_revenue = beta[0]
    
    return {
        'coefficients': beta,
        'r_squared': r_squared,
        'base_revenue': base_revenue,
        'contributions': contributions
    }

# Simulate weekly data
np.random.seed(42)

n_weeks = 104  # 2 years

# Simulate spend (with some correlation to mimic real patterns)
tv_spend = np.random.uniform(10000, 50000, n_weeks)
digital_spend = np.random.uniform(5000, 30000, n_weeks)
social_spend = np.random.uniform(2000, 15000, n_weeks)
print_spend = np.random.uniform(1000, 8000, n_weeks)

# Simulate revenue (function of spend + noise)
base_revenue = 200000
revenue = (
    base_revenue +
    tv_spend * 2.5 +      # TV: £2.50 return per £1
    digital_spend * 4.0 + # Digital: £4.00 return per £1
    social_spend * 3.0 +  # Social: £3.00 return per £1
    print_spend * 1.5 +   # Print: £1.50 return per £1
    np.random.normal(0, 20000, n_weeks)  # Noise
)

data = pd.DataFrame({
    'week': range(n_weeks),
    'tv_spend': tv_spend,
    'digital_spend': digital_spend,
    'social_spend': social_spend,
    'print_spend': print_spend,
    'revenue': revenue
})

results = simple_marketing_mix_model(data)

print("\nMarketing Mix Modelling")
print("=" * 70)
print(f"\nModel R²: {results['r_squared']:.3f}")
print(f"Base Revenue (no marketing): £{results['base_revenue']:,.0f}/week")

print(f"\n{'Channel':10} | {'ROI':12} | {'Avg Spend':15} | {'Contribution':15}")
print("-" * 60)

for channel, stats in results['contributions'].items():
    print(f"{channel:10} | £{stats['roi']:.2f}/£1 | £{stats['avg_spend']:13,.0f} | £{stats['contribution']:13,.0f}")

print("\nInterpretation:")
print("  ROI tells you revenue generated per £1 spent on each channel")
print("  Digital has highest ROI → consider shifting budget from Print")

print("\nMMM vs Attribution:")
print("  Attribution: Bottom-up, user-level, touchpoint credit")
print("  MMM: Top-down, aggregate, channel contribution via regression")
print("  Both are useful; MMM works without cookies")

Power Law Distributions: The Whale Effect

The Plain English Version

Most business metrics are not normally distributed. Customer value, in particular, follows a power law: a small number of customers drive a disproportionate amount of revenue.

The famous example: 1% of customers often drive 50% of revenue. This has profound implications for acquisition, retention, and product strategy. Your "average customer" is a statistical fiction.

The Math

A power law distribution has the form:

$P(X > x) \propto x^{-\alpha}$

The probability of values greater than x decreases polynomially, not exponentially. This means extreme values are far more likely than in a normal distribution.

The Pareto distribution is a common power law:

$f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}$ for $x \geq x_m$

Where:

$\alpha$ = shape parameter (smaller = heavier tail)
$x_m$ = minimum value

The Python Implementation

import numpy as np
import pandas as pd
from scipy import stats

def analyse_customer_value_distribution(values):
    """
    Analyse customer value distribution for power law characteristics.
    """
    
    values = np.array(values)
    n_customers = len(values)
    total_value = np.sum(values)
    
    # Sort descending
    sorted_values = np.sort(values)[::-1]
    cumulative_value = np.cumsum(sorted_values)
    cumulative_pct = cumulative_value / total_value
    
    # Find key thresholds
    customer_pcts = np.arange(1, n_customers + 1) / n_customers
    
    # What % of customers drive 50% of value?
    idx_50 = np.searchsorted(cumulative_pct, 0.5)
    pct_customers_for_50 = (idx_50 + 1) / n_customers
    
    # What % of value comes from top 1%, 10%, 20%?
    top_1_pct_idx = max(1, int(n_customers * 0.01))
    top_10_pct_idx = max(1, int(n_customers * 0.10))
    top_20_pct_idx = max(1, int(n_customers * 0.20))
    
    value_from_top_1 = cumulative_pct[top_1_pct_idx - 1] if top_1_pct_idx <= n_customers else 1.0
    value_from_top_10 = cumulative_pct[top_10_pct_idx - 1] if top_10_pct_idx <= n_customers else 1.0
    value_from_top_20 = cumulative_pct[top_20_pct_idx - 1] if top_20_pct_idx <= n_customers else 1.0
    
    # Fit Pareto distribution to estimate alpha
    # Using simple log-log regression
    log_values = np.log(sorted_values[sorted_values > 0])
    log_ranks = np.log(np.arange(1, len(log_values) + 1))
    
    if len(log_values) > 10:
        slope, intercept, r_value, p_value, std_err = stats.linregress(log_ranks, log_values)
        alpha_estimate = -1 / slope if slope != 0 else np.inf
    else:
        alpha_estimate = None
    
    return {
        'n_customers': n_customers,
        'total_value': total_value,
        'mean_value': np.mean(values),
        'median_value': np.median(values),
        'max_value': np.max(values),
        'pct_customers_for_50_value': pct_customers_for_50,
        'value_from_top_1_pct': value_from_top_1,
        'value_from_top_10_pct': value_from_top_10,
        'value_from_top_20_pct': value_from_top_20,
        'pareto_alpha': alpha_estimate,
        'gini_coefficient': gini(values)
    }

def gini(values):
    """
    Calculate Gini coefficient (measure of inequality).
    0 = perfect equality, 1 = perfect inequality
    """
    values = np.array(values)
    n = len(values)
    sorted_values = np.sort(values)
    cumsum = np.cumsum(sorted_values)
    return (2 * np.sum((np.arange(1, n+1) * sorted_values)) / (n * np.sum(values))) - (n + 1) / n

# Simulate realistic customer value distribution (power law)
np.random.seed(42)

# Pareto distribution (power law)
alpha = 1.5  # Shape parameter
xm = 50      # Minimum value

n_customers = 10000
customer_values = (np.random.pareto(alpha, n_customers) + 1) * xm

results = analyse_customer_value_distribution(customer_values)

print("Power Law Distribution: Customer Value Analysis")
print("=" * 70)

print(f"\nBasic Statistics:")
print(f"  Customers: {results['n_customers']:,}")
print(f"  Total Value: £{results['total_value']:,.0f}")
print(f"  Mean Value: £{results['mean_value']:,.2f}")
print(f"  Median Value: £{results['median_value']:,.2f}")
print(f"  Max Value: £{results['max_value']:,.2f}")

print(f"\nConcentration Analysis:")
print(f"  Top 1% of customers drive {results['value_from_top_1_pct']*100:.1f}% of value")
print(f"  Top 10% of customers drive {results['value_from_top_10_pct']*100:.1f}% of value")
print(f"  Top 20% of customers drive {results['value_from_top_20_pct']*100:.1f}% of value")
print(f"  {results['pct_customers_for_50_value']*100:.1f}% of customers drive 50% of value")

print(f"\nDistribution Metrics:")
print(f"  Gini Coefficient: {results['gini_coefficient']:.3f} (higher = more unequal)")
if results['pareto_alpha']:
    print(f"  Estimated Pareto α: {results['pareto_alpha']:.2f}")

print("\nStrategic Implications:")
print("  • Your 'average customer' is a statistical fiction")
print("  • Whales need different treatment than the long tail")
print("  • Retention of top 1% matters more than acquisition of average users")
print("  • CAC benchmarks should be segmented by expected LTV tier")

Price Elasticity: How Demand Responds to Price

The Plain English Version

If you raise prices 10%, do you lose 5% of customers? 20%? 50%? The answer determines whether the price increase makes you money or costs you money.

Price elasticity of demand quantifies this relationship.

Elastic demand (|ε| > 1): Demand changes more than proportionally to price. A 10% price increase causes >10% demand drop. Total revenue falls.

Inelastic demand (|ε| < 1): Demand changes less than proportionally. A 10% price increase causes <10% demand drop. Total revenue rises.

Unit elastic (|ε| = 1): Demand changes proportionally. Revenue stays the same.

The Formula

Price elasticity of demand:

$\varepsilon = \frac{\% \Delta Q}{\% \Delta P} = \frac{\partial Q / Q}{\partial P / P}$

Or using the point elasticity formula:

$\varepsilon = \frac{dQ}{dP} \times \frac{P}{Q}$

For a linear demand curve $Q = a - bP$ :

$\varepsilon = -b \times \frac{P}{Q} = -b \times \frac{P}{a - bP}$

Optimal Pricing with Elasticity

If you know elasticity, the revenue-maximising price change is:

$\frac{\Delta P}{P} = \frac{1}{|\varepsilon| - 1}$ (for |ε| > 1)

And the profit-maximising markup over marginal cost:

$\frac{P - MC}{P} = \frac{1}{|\varepsilon|}$

This is the Lerner Index.

The Python Implementation

import numpy as np
import pandas as pd

def estimate_price_elasticity(prices, quantities):
    """
    Estimate price elasticity from observed price-quantity data.
    Uses log-log regression: log(Q) = α + ε*log(P)
    """
    
    log_p = np.log(prices)
    log_q = np.log(quantities)
    
    # Linear regression
    n = len(prices)
    mean_log_p = np.mean(log_p)
    mean_log_q = np.mean(log_q)
    
    numerator = np.sum((log_p - mean_log_p) * (log_q - mean_log_q))
    denominator = np.sum((log_p - mean_log_p) ** 2)
    
    elasticity = numerator / denominator if denominator != 0 else 0
    
    return elasticity

def analyse_price_change_impact(current_price, current_quantity, elasticity, new_price):
    """
    Analyse impact of price change given elasticity.
    """
    
    pct_price_change = (new_price - current_price) / current_price
    pct_quantity_change = elasticity * pct_price_change
    
    new_quantity = current_quantity * (1 + pct_quantity_change)
    
    current_revenue = current_price * current_quantity
    new_revenue = new_price * new_quantity
    
    return {
        'current_price': current_price,
        'new_price': new_price,
        'pct_price_change': pct_price_change,
        'current_quantity': current_quantity,
        'new_quantity': new_quantity,
        'pct_quantity_change': pct_quantity_change,
        'current_revenue': current_revenue,
        'new_revenue': new_revenue,
        'revenue_change': new_revenue - current_revenue,
        'pct_revenue_change': (new_revenue - current_revenue) / current_revenue
    }

print("Price Elasticity Analysis")
print("=" * 70)

# Example: SaaS product pricing
current_price = 99
current_customers = 1000

# Different elasticity scenarios
elasticities = [
    (-0.5, "Inelastic: Strong brand, few alternatives"),
    (-1.0, "Unit elastic: Break-even point"),
    (-1.5, "Elastic: Competitive market"),
    (-2.5, "Highly elastic: Commodity product"),
]

print(f"\nCurrent state: {current_customers} customers at £{current_price}/mo")
print(f"Testing 10% price increase to £{current_price * 1.1:.0f}/mo")
print("-" * 70)

print(f"{'Elasticity':12} | {'Customer Δ':12} | {'Revenue Δ':12} | {'Interpretation':25}")
print("-" * 70)

for elasticity, desc in elasticities:
    result = analyse_price_change_impact(current_price, current_customers, elasticity, current_price * 1.1)
    print(f"{elasticity:10.1f} | {result['pct_quantity_change']*100:+10.1f}% | {result['pct_revenue_change']*100:+10.1f}% | {desc[:25]}")

print("\nKey Insight:")
print("  If |ε| < 1: Price increase raises revenue")
print("  If |ε| > 1: Price increase lowers revenue")
print("  If |ε| = 1: Revenue stays the same")

Van Westendorp Price Sensitivity Meter

import numpy as np
import pandas as pd

def van_westendorp_analysis(survey_data):
    """
    Van Westendorp Price Sensitivity Meter.
    
    Asks 4 questions:
    1. Too cheap (quality concerns): Below what price?
    2. Cheap (good deal): Below what price?
    3. Expensive (but would consider): Above what price?
    4. Too expensive (won't buy): Above what price?
    
    Plots cumulative curves to find:
    - Point of Marginal Cheapness (PMC): Too cheap = Not cheap
    - Point of Marginal Expensiveness (PME): Too expensive = Not expensive
    - Optimal Price Point (OPP): Too cheap = Too expensive
    - Indifference Price Point (IPP): Cheap = Expensive
    """
    
    prices = np.linspace(10, 200, 1000)
    
    # Cumulative distributions
    too_cheap_cum = np.array([np.mean(survey_data['too_cheap'] >= p) for p in prices])
    cheap_cum = np.array([np.mean(survey_data['cheap'] >= p) for p in prices])
    expensive_cum = np.array([np.mean(survey_data['expensive'] <= p) for p in prices])
    too_expensive_cum = np.array([np.mean(survey_data['too_expensive'] <= p) for p in prices])
    
    # Inverted for "Not" curves
    not_cheap_cum = 1 - cheap_cum
    not_expensive_cum = 1 - expensive_cum
    
    # Find intersection points
    def find_intersection(curve1, curve2, prices):
        diff = curve1 - curve2
        idx = np.argmin(np.abs(diff))
        return prices[idx]
    
    pmc = find_intersection(too_cheap_cum, not_cheap_cum, prices)
    pme = find_intersection(too_expensive_cum, not_expensive_cum, prices)
    opp = find_intersection(too_cheap_cum, too_expensive_cum, prices)
    ipp = find_intersection(cheap_cum, expensive_cum, prices)
    
    return {
        'pmc': pmc,  # Point of Marginal Cheapness
        'pme': pme,  # Point of Marginal Expensiveness
        'opp': opp,  # Optimal Price Point
        'ipp': ipp,  # Indifference Price Point
        'acceptable_range': (pmc, pme)
    }

# Simulate survey data
np.random.seed(42)

n_respondents = 200

survey_data = pd.DataFrame({
    'too_cheap': np.random.normal(30, 10, n_respondents).clip(10, 60),
    'cheap': np.random.normal(50, 15, n_respondents).clip(20, 90),
    'expensive': np.random.normal(80, 20, n_respondents).clip(40, 150),
    'too_expensive': np.random.normal(120, 25, n_respondents).clip(60, 200),
})

results = van_westendorp_analysis(survey_data)

print("\nVan Westendorp Price Sensitivity Meter")
print("=" * 70)

print(f"\nKey Price Points:")
print(f"  Point of Marginal Cheapness (PMC): £{results['pmc']:.0f}")
print(f"    Below this, 'too cheap' concerns outweigh 'good deal' perception")

print(f"\n  Point of Marginal Expensiveness (PME): £{results['pme']:.0f}")
print(f"    Above this, 'too expensive' outweighs 'acceptable'")

print(f"\n  Optimal Price Point (OPP): £{results['opp']:.0f}")
print(f"    Where 'too cheap' = 'too expensive' (minimum resistance)")

print(f"\n  Indifference Price Point (IPP): £{results['ipp']:.0f}")
print(f"    Where 'cheap' = 'expensive' (perceived as fair)")

print(f"\n  Acceptable Price Range: £{results['acceptable_range'][0]:.0f} - £{results['acceptable_range'][1]:.0f}")

print("\nPractical Use:")
print("  1. Survey your target market with the 4 questions")
print("  2. Plot cumulative curves")
print("  3. Find intersection points")
print("  4. Price within acceptable range, near OPP or IPP")

The Retention Math: Why 5% Can Mean 95%

The Plain English Version

Bain & Company's famous statistic: a 5% increase in retention can mean a 25-95% increase in profits.

This seems impossibly large. But the math checks out when you understand the compounding effects:

No reacquisition cost: Retained customers cost £0 to reacquire.
Revenue compounds: Each additional period of retention adds more revenue.
Referrals: Long-term customers refer more.
Price tolerance: Loyal customers accept price increases.
Lower service costs: Experienced customers need less support.

The Formula

The value of improved retention can be modelled as:

$\Delta LTV = \sum_{t=0}^{T} \frac{ARPU_t \times (r_{new}^t - r_{old}^t)}{(1+d)^t}$

Where:

$r_{old}$ = old retention rate
$r_{new}$ = new retention rate (5% better)
$d$ = discount rate
$T$ = time horizon

Or using the simplified formula:

$\Delta LTV = ARPU \times \left(\frac{1}{1-r_{new}} - \frac{1}{1-r_{old}}\right)$

The Python Implementation

import numpy as np
import pandas as pd

def calculate_retention_value_impact(arpu, margin, old_retention, new_retention, 
                                      cac, discount_rate=0.10, horizon_months=60):
    """
    Calculate the profit impact of improved retention.
    
    Accounts for:
    - Revenue over extended lifespan
    - No reacquisition cost
    - Discounted future value
    """
    
    # Monthly contribution
    monthly_contribution = arpu * margin
    monthly_discount = (1 + discount_rate) ** (1/12) - 1
    
    # Calculate LTV under each retention scenario
    def calculate_ltv(retention_rate):
        ltv = 0
        survival_prob = 1.0
        
        for month in range(1, horizon_months + 1):
            survival_prob *= retention_rate
            discounted_contribution = monthly_contribution * survival_prob / ((1 + monthly_discount) ** month)
            ltv += discounted_contribution
        
        return ltv
    
    ltv_old = calculate_ltv(old_retention)
    ltv_new = calculate_ltv(new_retention)
    
    # Profit = LTV - CAC
    profit_old = ltv_old - cac
    profit_new = ltv_new - cac
    
    # Improvement
    ltv_improvement = ltv_new - ltv_old
    profit_improvement = profit_new - profit_old
    profit_improvement_pct = (profit_new / profit_old - 1) * 100 if profit_old > 0 else float('inf')
    
    return {
        'old_retention': old_retention,
        'new_retention': new_retention,
        'retention_improvement_pct': (new_retention / old_retention - 1) * 100,
        'ltv_old': ltv_old,
        'ltv_new': ltv_new,
        'ltv_improvement': ltv_improvement,
        'ltv_improvement_pct': (ltv_new / ltv_old - 1) * 100,
        'profit_old': profit_old,
        'profit_new': profit_new,
        'profit_improvement': profit_improvement,
        'profit_improvement_pct': profit_improvement_pct
    }

print("Retention Math: The Compounding Effect")
print("=" * 70)

# Example: SaaS business
arpu = 100
margin = 0.80
cac = 400
old_retention = 0.95  # 95% monthly retention = 5% churn

# 5% improvement = 95% → 95% × 1.05 = 99.75%? No, that's not how it works.
# 5% REDUCTION in churn: 5% churn → 5% × 0.95 = 4.75% churn → 95.25% retention
# Or 5 percentage points: 95% → 96% (but that's actually 20% reduction in churn!)

# Let's model 5 percentage point improvement in churn rate (the common interpretation)
churn_reduction_scenarios = [
    (0.05, "5% churn → 4.75% (5% reduction in churn rate)"),
    (0.10, "5% churn → 4.5% (10% reduction in churn rate)"),
    (0.20, "5% churn → 4% (20% reduction in churn rate)"),
]

print(f"\nBase case: £{arpu}/mo ARPU, {margin*100:.0f}% margin, £{cac} CAC")
print(f"Current: {(1-old_retention)*100:.0f}% monthly churn ({old_retention*100:.0f}% retention)")
print("-" * 70)

print(f"{'Scenario':50} | {'LTV Δ':10} | {'Profit Δ':12}")
print("-" * 70)

for reduction, desc in churn_reduction_scenarios:
    old_churn = 1 - old_retention
    new_churn = old_churn * (1 - reduction)
    new_retention = 1 - new_churn
    
    result = calculate_retention_value_impact(arpu, margin, old_retention, new_retention, cac)
    print(f"{desc:50} | +{result['ltv_improvement_pct']:6.1f}% | +{result['profit_improvement_pct']:8.1f}%")

print("\n" + "=" * 70)
print("Why Retention Improvements Have Outsized Profit Impact:")
print("-" * 70)

print("\n1. NO REACQUISITION COST")
print(f"   CAC = £{cac}. Every retained customer saves this.")

print("\n2. REVENUE COMPOUNDS")
print("   Each month of extended lifespan adds £{:.0f} contribution".format(arpu * margin))

print("\n3. LEVERAGE ON CAC")
print(f"   CAC is fixed. LTV increases flow directly to profit.")
print(f"   If CAC = £400 and LTV goes from £800 to £1000:")
print(f"   Profit: £400 → £600 = +50% profit from +25% LTV")

print("\n4. THE BAIN STAT EXPLAINED")
print("   '5% retention improvement → 25-95% profit increase'")
print("   The range depends on CAC, margins, and starting retention")
print("   Lower margins and higher CAC = bigger leverage effect")

Modelling Retention Scenarios

import numpy as np
import pandas as pd

def retention_scenario_analysis(arpu, margin, cac, retention_rates, discount_rate=0.10):
    """
    Analyse profitability across different retention scenarios.
    """
    
    results = []
    
    for retention in retention_rates:
        churn = 1 - retention
        avg_lifespan = 1 / churn if churn > 0 else float('inf')
        ltv = (arpu * margin) / churn if churn > 0 else float('inf')
        profit = ltv - cac
        ltv_cac_ratio = ltv / cac if cac > 0 else float('inf')
        
        results.append({
            'retention': retention,
            'churn': churn,
            'avg_lifespan_months': avg_lifespan,
            'ltv': ltv,
            'profit': profit,
            'ltv_cac_ratio': ltv_cac_ratio,
            'profitable': profit > 0
        })
    
    return pd.DataFrame(results)

print("\nRetention Scenario Analysis")
print("=" * 90)

arpu = 100
margin = 0.80
cac = 400

retention_rates = [0.90, 0.92, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]

results = retention_scenario_analysis(arpu, margin, cac, retention_rates)

print(f"\nAssumptions: £{arpu}/mo ARPU, {margin*100:.0f}% margin, £{cac} CAC")
print("-" * 90)
print(f"{'Retention':10} | {'Churn':8} | {'Lifespan':12} | {'LTV':12} | {'Profit':12} | {'LTV:CAC':10}")
print("-" * 90)

for _, row in results.iterrows():
    print(f"{row['retention']*100:8.0f}% | {row['churn']*100:6.0f}% | {row['avg_lifespan_months']:10.1f}mo | £{row['ltv']:10,.0f} | £{row['profit']:10,.0f} | {row['ltv_cac_ratio']:8.1f}:1")

print("\nKey Insight:")
print("  Moving from 95% to 97% retention (2 percentage points)")
print(f"  Changes lifespan from {results[results['retention']==0.95]['avg_lifespan_months'].values[0]:.0f}mo to {results[results['retention']==0.97]['avg_lifespan_months'].values[0]:.0f}mo")
print(f"  Changes LTV from £{results[results['retention']==0.95]['ltv'].values[0]:,.0f} to £{results[results['retention']==0.97]['ltv'].values[0]:,.0f}")
print(f"  That's a {(results[results['retention']==0.97]['ltv'].values[0] / results[results['retention']==0.95]['ltv'].values[0] - 1)*100:.0f}% increase in LTV from 2pp retention improvement")

Conclusion: Math Is Not Optional

Every principle in this series (scarcity, social proof, habit formation, the lot) generates data. That data needs to be measured, modelled, and acted upon.

LTV:CAC tells you if growth is sustainable. Below 3:1, you are building a leaky bucket.

Payback period tells you if you can afford it. Long payback means you need capital.

Cohort analysis reveals what averages hide. Never aggregate when you can segment.

Survival analysis models churn properly. Time matters. Customer tenure matters.

Bayesian testing gives you probability. Stop asking if p < 0.05 and start asking "what's the probability B is better?"

Bandits reduce opportunity cost. Stop splitting traffic 50/50 when you can learn and earn simultaneously.

MMM works when attribution does not. Cookies are dying. Top-down modelling is back.

Power laws mean averages lie. Your 1% of customers are not your average customers.

Elasticity determines if price increases make money. Test, measure, model.

Retention compounds. Small improvements in retention have outsized profit impact.

This is the math that separates data-informed growth from guesswork. Every formula in this post is battle-tested. The code runs. The numbers work.

I have applied every concept here across e-commerce, SaaS, and B2B products. The math is not theoretical. It is the working toolkit I use to make decisions about millions of pounds in marketing spend and product investments. If you want to move from gut-feel to data-driven, start here.

Need help building these models for your business? Or training your team on growth mathematics? I can help you implement the measurement systems and run the analyses. Let us chat.

LTV:CAC Ratio: The Unit Economics Foundation

The Plain English Version

The Formula

The Benchmarks

The Python Implementation

Advanced: Cohorted LTV

Payback Period: When Do You Get Your Money Back?

The Plain English Version

The Formula

The Benchmarks

The Python Implementation

Cohort Analysis: Stop Averaging Everything

The Plain English Version

The Key Cohort Metrics

The Python Implementation

Cohort Analysis for Revenue

Survival Analysis: A Better Way to Model Churn

The Plain English Version

The Key Concepts

Why This Matters for Marketing

The Python Implementation

Comparing Cohorts with Log-Rank Test

Bayesian A/B Testing: Probability, Not P-Values

The Plain English Version

The Key Concepts

Why Bayesian Is Better for Low-Traffic Sites

The Python Implementation

When to Stop a Bayesian Test

Multi-Armed Bandit: Adaptive Testing

The Plain English Version

The Key Algorithm: Thompson Sampling

The Math

The Python Implementation

Marketing Mix Modelling vs Attribution

The Plain English Version

Attribution Models

Marketing Mix Modelling

Power Law Distributions: The Whale Effect

The Plain English Version

The Math

The Python Implementation

Price Elasticity: How Demand Responds to Price

The Plain English Version

The Formula

Optimal Pricing with Elasticity

The Python Implementation

Van Westendorp Price Sensitivity Meter

The Retention Math: Why 5% Can Mean 95%

The Plain English Version

The Formula

The Python Implementation

Modelling Retention Scenarios

Conclusion: Math Is Not Optional

Ready to bring mathematical rigour to your growth strategy? I can help you build LTV models, set up cohort analysis, implement Bayesian testing, and model your retention economics. Get in touch.