AI for Reducing Cart Abandonment and Returns

Part 1: Predicting Cart Abandonment

Why People Abandon (and Why It's Predictable)

The top reasons shoppers abandon carts are well-documented: 48% cite unexpected extra costs (shipping, tax), 24% are forced to create an account, 22% find delivery too slow, 18% have concerns about return policies, 17% find the checkout too complicated, and 13% can't find their preferred payment method.

But here's what makes this an ML problem rather than a UX problem: different shoppers abandon for different reasons, and you can predict which reason applies to which shopper based on their behaviour. A shopper who repeatedly toggles between payment methods is having a payment friction problem. A shopper who navigates away to check competitor prices is having a price confidence problem. A shopper who lingers on the delivery information page is having a shipping cost or timeline problem.

The behavioural signals are already in your server logs and analytics events. You just need a model that recognises the patterns.

Python: Abandonment Risk Scorer

This gradient boosting classifier predicts the probability that a current session will end in abandonment:

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    classification_report, roc_auc_score, precision_recall_curve
)
import joblib

def train_abandonment_model(sessions_csv: str):
    """
    Train an abandonment risk model.

    Sessions CSV features (per checkout session):
    - session_id, user_id, timestamp
    - cart_value, cart_item_count
    - time_on_checkout_seconds
    - payment_method_switches (how many times they changed)
    - items_removed_during_checkout
    - pages_viewed_before_checkout
    - is_returning_customer
    - previous_abandonment_count
    - device_type (mobile=1, desktop=0)
    - hour_of_day
    - delivery_page_time_seconds
    - promo_code_attempted (did they try a code)
    - promo_code_valid (did the code work)
    - competitor_tab_switches (if detectable via visibility API)
    - abandoned (target: 1 = abandoned, 0 = completed)
    """
    df = pd.read_csv(sessions_csv)

    # Engineer additional features
    df['avg_time_per_item'] = (
        df['time_on_checkout_seconds'] /
        df['cart_item_count'].clip(lower=1)
    )
    df['hesitation_score'] = (
        df['payment_method_switches'] +
        df['items_removed_during_checkout'] * 2
    )
    df['promo_frustration'] = (
        (df['promo_code_attempted'] == 1) &
        (df['promo_code_valid'] == 0)
    ).astype(int)

    feature_cols = [
        'cart_value', 'cart_item_count',
        'time_on_checkout_seconds', 'payment_method_switches',
        'items_removed_during_checkout',
        'pages_viewed_before_checkout',
        'is_returning_customer', 'previous_abandonment_count',
        'device_type', 'hour_of_day',
        'delivery_page_time_seconds',
        'promo_frustration',
        'avg_time_per_item', 'hesitation_score'
    ]

    X = df[feature_cols]
    y = df['abandoned']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    model = GradientBoostingClassifier(
        n_estimators=200,
        max_depth=5,
        learning_rate=0.1,
        subsample=0.8,
        min_samples_leaf=20
    )
    model.fit(X_train, y_train)

    # Evaluate
    y_pred = model.predict(X_test)
    y_proba = model.predict_proba(X_test)[:, 1]

    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    print(f"AUC-ROC: {roc_auc_score(y_test, y_proba):.3f}")

    # Feature importance
    importance = sorted(
        zip(feature_cols, model.feature_importances_),
        key=lambda x: x[1], reverse=True
    )
    print("\nFeature Importance:")
    for feat, imp in importance:
        print(f"  {feat}: {imp:.3f}")

    # Find optimal threshold for intervention
    precision, recall, thresholds = precision_recall_curve(
        y_test, y_proba
    )
    # We want high recall (catch most abandoners) with
    # acceptable precision (don't annoy completers)
    f1_scores = 2 * (precision * recall) / (precision + recall + 1e-8)
    optimal_idx = np.argmax(f1_scores)
    optimal_threshold = thresholds[optimal_idx]
    print(f"\nOptimal intervention threshold: {optimal_threshold:.3f}")

    joblib.dump(model, 'abandonment_model.pkl')
    return model, optimal_threshold


def score_live_session(model, session_features: dict,
                       threshold: float) -> dict:
    """Score a live checkout session for abandonment risk."""
    features = pd.DataFrame([session_features])
    probability = model.predict_proba(features)[0][1]

    return {
        'abandonment_probability': round(probability, 3),
        'risk_level': (
            'high' if probability > threshold else
            'medium' if probability > threshold * 0.7 else
            'low'
        ),
        'should_intervene': probability > threshold,
        'suggested_intervention': suggest_intervention(
            session_features, probability
        )
    }


def suggest_intervention(features: dict, probability: float) -> str:
    """Suggest the right intervention based on behaviour signals."""
    if features.get('promo_frustration', 0):
        return 'offer_alternative_discount'
    if features.get('delivery_page_time_seconds', 0) > 30:
        return 'highlight_delivery_options'
    if features.get('payment_method_switches', 0) > 2:
        return 'show_payment_help'
    if features.get('is_returning_customer', 0):
        return 'gentle_reminder'  # Don't discount loyal customers
    if features.get('cart_value', 0) > 100:
        return 'offer_free_shipping'
    return 'exit_intent_popup'

The feature importance output from this model is illuminating. Across most ecommerce datasets, the top predictors tend to be: previous_abandonment_count (serial abandoners are the strongest signal), hesitation_score (adding and removing items, switching payment methods), device_type (mobile abandons at 85% vs desktop at 70%), and time_on_checkout_seconds (too long means friction, too short means browsing).

Solidus/Rails: The Intervention Engine

The Python model scores sessions. The Rails side decides what to show and when:

module CartRecovery
  class InterventionEngine
    INTERVENTION_MAP = {
      offer_alternative_discount: {
        type: :popup,
        template: 'checkout/interventions/discount_offer',
        delay_seconds: 0,
        requires_consent: false
      },
      highlight_delivery_options: {
        type: :inline,
        template: 'checkout/interventions/delivery_highlight',
        delay_seconds: 0,
        requires_consent: false
      },
      show_payment_help: {
        type: :chat,
        message: 'Need help with payment? We accept...',
        delay_seconds: 5,
        requires_consent: false
      },
      gentle_reminder: {
        type: :email,
        template: 'cart_recovery/gentle_reminder',
        delay_seconds: 3600,  # 1 hour after abandonment
        requires_consent: true
      },
      offer_free_shipping: {
        type: :banner,
        template: 'checkout/interventions/free_shipping',
        delay_seconds: 0,
        requires_consent: false
      },
      exit_intent_popup: {
        type: :exit_intent,
        template: 'checkout/interventions/exit_intent',
        delay_seconds: 0,
        requires_consent: false
      }
    }.freeze

    def evaluate_and_intervene(session:, order:)
      # Build features from live session
      features = SessionFeatureBuilder.build(session, order)

      # Score with Python model
      risk = PythonBridge.score_abandonment(features)

      return unless risk[:should_intervene]

      # Check we haven't already intervened this session
      return if session.intervention_shown?

      # Select and record intervention
      intervention = INTERVENTION_MAP[risk[:suggested_intervention].to_sym]
      record_intervention(session, risk, intervention)

      intervention
    end
  end

  class SessionFeatureBuilder
    def self.build(session, order)
      user = order.user

      {
        cart_value: order.total.to_f,
        cart_item_count: order.line_items.count,
        time_on_checkout_seconds: session.checkout_duration_seconds,
        payment_method_switches: session.payment_switches_count,
        items_removed_during_checkout: session.removals_during_checkout,
        pages_viewed_before_checkout: session.page_count,
        is_returning_customer: user&.orders&.complete&.any? ? 1 : 0,
        previous_abandonment_count: user ? abandoned_count(user) : 0,
        device_type: session.mobile? ? 1 : 0,
        hour_of_day: Time.current.hour,
        delivery_page_time_seconds: session.time_on_page(:delivery),
        promo_frustration: session.failed_promo_attempt? ? 1 : 0,
        avg_time_per_item: avg_time_per_item(session, order),
        hesitation_score: hesitation_score(session)
      }
    end
  end
end

Post-Abandonment Recovery Sequencing

When a cart is abandoned despite intervention, the recovery email sequence should be timed and personalised based on the model's signals:

module CartRecovery
  class EmailSequencer
    SEQUENCES = {
      price_sensitive: [
        { delay: 1.hour,  template: :reminder_with_urgency },
        { delay: 24.hours, template: :discount_offer_5pct },
        { delay: 72.hours, template: :last_chance_10pct }
      ],
      delivery_concerned: [
        { delay: 1.hour,  template: :delivery_reassurance },
        { delay: 24.hours, template: :express_delivery_offer }
      ],
      returning_customer: [
        { delay: 2.hours, template: :gentle_nudge },
        { delay: 48.hours, template: :product_back_in_stock }
      ],
      default: [
        { delay: 1.hour,  template: :simple_reminder },
        { delay: 24.hours, template: :social_proof },
        { delay: 72.hours, template: :small_incentive }
      ]
    }.freeze

    def schedule_recovery(order:, abandonment_context:)
      return unless order.user&.email_consent?

      sequence_key = determine_sequence(abandonment_context)
      sequence = SEQUENCES[sequence_key]

      sequence.each_with_index do |step, index|
        RecoveryEmailJob.set(wait: step[:delay]).perform_later(
          order_id: order.id,
          template: step[:template],
          sequence_position: index + 1,
          sequence_key: sequence_key
        )
      end
    end

    private

    def determine_sequence(context)
      if context[:promo_frustration] == 1
        :price_sensitive
      elsif context[:delivery_page_time_seconds].to_i > 30
        :delivery_concerned
      elsif context[:is_returning_customer] == 1
        :returning_customer
      else
        :default
      end
    end
  end
end

Part 2: Predicting and Preventing Returns

Why Returns Happen (and Which Ones Are Preventable)

Not all returns are equal. Some are healthy - a customer buying a gift that doesn't fit is a natural part of commerce. But many returns are preventable if you catch the signals early enough.

Fit and sizing issues cause roughly 70% of fashion returns. This is the biggest lever. If you can help customers order the right size the first time, you eliminate the majority of preventable returns.

Expectation mismatches ("it looked different online") cause another significant chunk. Better product data helps here - and connects directly to the GEO optimisation work in the product discovery post.

Bracketing - deliberately ordering multiple sizes to try at home - is rising, with over 51% of Gen Z shoppers admitting to it. This is harder to prevent but can be detected and mitigated.

Python: Return Risk Prediction Model

This model predicts the probability that a specific order will be returned, scored at the point of order placement:

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
import joblib

def train_return_risk_model(orders_csv: str):
    """
    Train a return risk model.

    Orders CSV features (per order line item):
    - order_id, user_id, product_id, variant_id
    - size_ordered, most_common_size_for_user
    - size_deviation (ordered size - usual size in numeric)
    - product_category, product_subcategory
    - product_avg_return_rate (historical return rate for this product)
    - product_review_mentions_sizing_issue (count)
    - customer_return_rate (this customer's historical return rate)
    - customer_order_count
    - order_value, order_item_count
    - same_product_multiple_sizes (bracketing indicator: 1/0)
    - days_since_last_order
    - is_sale_item
    - payment_method (BNPL tends to correlate with higher returns)
    - was_returned (target: 1 = returned, 0 = kept)
    """
    df = pd.read_csv(orders_csv)

    # Engineer features
    df['is_bracketing'] = df['same_product_multiple_sizes']
    df['size_risk'] = df['size_deviation'].abs()
    df['high_return_product'] = (
        df['product_avg_return_rate'] > 0.25
    ).astype(int)
    df['high_return_customer'] = (
        df['customer_return_rate'] > 0.30
    ).astype(int)
    df['review_sizing_flag'] = (
        df['product_review_mentions_sizing_issue'] > 3
    ).astype(int)
    df['bnpl_payment'] = (
        df['payment_method'] == 'bnpl'
    ).astype(int)

    feature_cols = [
        'size_risk', 'product_avg_return_rate',
        'customer_return_rate', 'customer_order_count',
        'order_value', 'order_item_count',
        'is_bracketing', 'high_return_product',
        'high_return_customer', 'review_sizing_flag',
        'days_since_last_order', 'is_sale_item',
        'bnpl_payment'
    ]

    X = df[feature_cols]
    y = df['was_returned']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    model = GradientBoostingClassifier(
        n_estimators=200,
        max_depth=5,
        learning_rate=0.1,
        subsample=0.8
    )
    model.fit(X_train, y_train)

    y_proba = model.predict_proba(X_test)[:, 1]
    print(f"AUC-ROC: {roc_auc_score(y_test, y_proba):.3f}")
    print(classification_report(
        y_test, model.predict(X_test)
    ))

    importance = sorted(
        zip(feature_cols, model.feature_importances_),
        key=lambda x: x[1], reverse=True
    )
    print("\nReturn Risk Feature Importance:")
    for feat, imp in importance:
        print(f"  {feat}: {imp:.3f}")

    joblib.dump(model, 'return_risk_model.pkl')
    return model

The top features across most fashion datasets: customer_return_rate (serial returners are the strongest signal), is_bracketing (multiple sizes of the same product is a near-certain partial return), size_risk (ordering an unusual size for this customer), and product_avg_return_rate (some products just have inherent sizing issues).

Python: Collaborative Filtering for Size Recommendation

This is where it gets genuinely interesting. The same collaborative filtering that powers "customers who bought X also bought Y" can predict the right size:

import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors

def build_size_recommender(purchase_history_csv: str):
    """
    Build a size recommender using collaborative filtering.

    Purchase history CSV:
    - user_id, product_id, size_ordered, was_returned,
      kept_size (size they kept if they exchanged)
    - user_height_cm, user_weight_kg (if available from profile)
    - user_typical_size_tops, user_typical_size_bottoms
    """
    df = pd.read_csv(purchase_history_csv)

    # Build a "success" dataset: orders that were NOT returned
    kept = df[df['was_returned'] == 0].copy()

    # For each product, build a profile of who kept which size
    def recommend_size(product_id: int,
                       user_profile: dict,
                       n_neighbors: int = 20) -> dict:
        """Recommend a size for a user based on similar users."""
        product_data = kept[kept['product_id'] == product_id]

        if len(product_data) < 10:
            return {'recommendation': None,
                    'confidence': 'insufficient_data'}

        # Build feature matrix from users who bought this product
        profile_features = ['user_height_cm', 'user_weight_kg',
                           'user_typical_size_tops',
                           'user_typical_size_bottoms']

        # Filter to users with profile data
        with_profile = product_data.dropna(subset=profile_features)
        if len(with_profile) < 5:
            # Fall back to most popular kept size
            most_common = (
                product_data['size_ordered']
                .value_counts()
                .index[0]
            )
            return {'recommendation': most_common,
                    'confidence': 'low',
                    'method': 'popularity'}

        X = with_profile[profile_features].values
        user_vector = np.array([[user_profile.get(f, 0)
                                 for f in profile_features]])

        # Find similar users
        nn = NearestNeighbors(n_neighbors=min(n_neighbors,
                                              len(X)))
        nn.fit(X)
        distances, indices = nn.kneighbors(user_vector)

        # Weight by inverse distance
        similar_purchases = with_profile.iloc[indices[0]]
        weights = 1 / (distances[0] + 1e-6)

        # Weighted vote for size
        size_votes = {}
        for size, weight in zip(
            similar_purchases['size_ordered'], weights
        ):
            size_votes[size] = size_votes.get(size, 0) + weight

        recommended = max(size_votes, key=size_votes.get)
        total_weight = sum(size_votes.values())
        confidence = size_votes[recommended] / total_weight

        return {
            'recommendation': recommended,
            'confidence': (
                'high' if confidence > 0.6
                else 'medium' if confidence > 0.4
                else 'low'
            ),
            'confidence_score': round(confidence, 3),
            'method': 'collaborative_filtering',
            'similar_users_count': len(similar_purchases),
            'size_distribution': {
                k: round(v / total_weight, 2)
                for k, v in sorted(size_votes.items())
            }
        }

    return recommend_size

The beauty of this approach: it gets smarter with every purchase. Each successful (non-returned) order teaches the model what size works for what body type. Over time, the recommendations become highly accurate - retailers using similar approaches report size-related return reductions of 27%.

Solidus/Rails: The Return Prevention Pipeline

Tie the Python models into your Solidus checkout flow:

module ReturnPrevention
  class Pipeline
    def evaluate_order(order)
      results = order.line_items.map do |item|
        risk = score_return_risk(item, order)
        size_rec = check_sizing(item, order.user)

        LineItemRisk.new(
          line_item: item,
          return_probability: risk[:probability],
          risk_level: risk[:risk_level],
          size_recommendation: size_rec,
          interventions: determine_interventions(risk, size_rec)
        )
      end

      OrderRiskAssessment.new(
        order: order,
        line_item_risks: results,
        overall_risk: aggregate_risk(results),
        bracketing_detected: detect_bracketing(order)
      )
    end

    private

    def check_sizing(line_item, user)
      return nil unless line_item.variant.option_values
                         .any? { |ov| ov.option_type.name == 'size' }

      ordered_size = line_item.variant.option_values
                      .find { |ov| ov.option_type.name == 'size' }
                      &.name

      recommendation = PythonBridge.recommend_size(
        product_id: line_item.product.id,
        user_profile: build_user_profile(user)
      )

      if recommendation[:recommendation] &&
         recommendation[:recommendation] != ordered_size &&
         recommendation[:confidence] != 'low'
        {
          ordered: ordered_size,
          recommended: recommendation[:recommendation],
          confidence: recommendation[:confidence],
          mismatch: true,
          message: sizing_message(ordered_size,
                                 recommendation)
        }
      else
        { ordered: ordered_size, mismatch: false }
      end
    end

    def determine_interventions(risk, size_rec)
      interventions = []

      if size_rec&.dig(:mismatch)
        interventions << {
          type: :size_suggestion,
          priority: :high,
          message: size_rec[:message]
        }
      end

      if risk[:probability] > 0.6
        interventions << {
          type: :fit_confirmation,
          priority: :medium,
          message: 'Check our size guide for this item  -  '\
                   'it runs differently to similar products'
        }
      end

      if risk[:bracketing_signal]
        interventions << {
          type: :exchange_nudge,
          priority: :low,
          message: 'Not sure about sizing? Our free exchange '\
                   'policy means you can swap sizes easily'
        }
      end

      interventions
    end

    def detect_bracketing(order)
      # Check if same product ordered in multiple sizes
      order.line_items
        .group_by { |li| li.product.id }
        .any? { |_, items| items.length > 1 }
    end
  end
end

Product-Level Return Analytics

Beyond individual order scoring, aggregate return data at the product level to fix systemic issues:

module ReturnPrevention
  class ProductAnalytics
    def analyse(product)
      returns = Spree::ReturnItem
        .joins(return_authorization: :order)
        .joins(:inventory_unit)
        .where(inventory_units: {
          variant_id: product.variant_ids
        })

      total_sold = product.line_items
        .joins(:order)
        .where(orders: { state: 'complete' })
        .sum(:quantity)

      return_rate = total_sold.positive? ?
        (returns.count.to_f / total_sold * 100).round(1) : 0

      {
        return_rate: return_rate,
        total_sold: total_sold,
        total_returned: returns.count,
        return_reasons: reason_breakdown(returns),
        sizing_analysis: sizing_analysis(product, returns),
        review_sentiment: review_sizing_sentiment(product),
        recommendations: generate_recommendations(
          return_rate, returns, product
        )
      }
    end

    private

    def sizing_analysis(product, returns)
      size_returns = returns.joins(
        inventory_unit: { variant: :option_values }
      ).where(
        option_values: {
          option_type_id: Spree::OptionType.find_by(name: 'size')&.id
        }
      ).group('spree_option_values.name')
       .count

      size_sales = product.line_items
        .joins(variant: :option_values)
        .where(option_values: {
          option_type_id: Spree::OptionType.find_by(name: 'size')&.id
        })
        .group('spree_option_values.name')
        .sum(:quantity)

      size_sales.map do |size, sold|
        returned = size_returns[size] || 0
        rate = sold.positive? ? (returned.to_f / sold * 100).round(1) : 0
        {
          size: size,
          sold: sold,
          returned: returned,
          return_rate: rate,
          flag: rate > 30 ? :investigate : :normal
        }
      end
    end

    def generate_recommendations(return_rate, returns, product)
      recs = []

      if return_rate > 25
        recs << 'High return rate. Review product description '\
                'and imagery for expectation mismatches.'
      end

      sizing = sizing_analysis(product, returns)
      problem_sizes = sizing.select { |s| s[:flag] == :investigate }
      if problem_sizes.any?
        sizes = problem_sizes.map { |s| s[:size] }.join(', ')
        recs << "Sizes #{sizes} have return rates above 30%. "\
                "Review size guide accuracy for these sizes."
      end

      reasons = reason_breakdown(returns)
      if reasons['too_small'].to_i > reasons['too_large'].to_i * 2
        recs << 'Product consistently runs small. Consider '\
                'adding "runs small" note to description or '\
                'adjusting size chart.'
      end

      recs
    end
  end
end

The Economics of Prevention

Let's make the business case concrete with round numbers.

A Solidus store doing £1M monthly revenue with a 70% cart abandonment rate is losing approximately £2.33M in potential revenue every month (the £1M represents only 30% of checkout initiations). Reducing abandonment by just 5 percentage points (from 70% to 65%) recovers roughly £166,000 in monthly revenue. Even if only 20% of recovered carts convert, that's £33,000/month from a model that costs nothing once trained.

On the returns side: a store with a 25% return rate on £1M revenue processes £250,000 in returns monthly. Each return costs £10-15 in shipping, handling, inspection, and restocking. That's £2.5M-3.75M annually in return processing costs alone, not counting the margin loss on items that can't be resold at full price. Reducing the return rate by 3 percentage points (from 25% to 22%) saves approximately £30,000/month in processing costs and recovers margin on £30,000 in products that would have been returned.

These aren't speculative numbers. They're basic arithmetic applied to industry-standard rates. The ML models don't need to be perfect - they need to be better than doing nothing.

Getting Started

Instrument your checkout. Track every interaction: page times, payment switches, item removals, promo code attempts, delivery page engagement. These are your training features.
Export your return data. You need: order, product, size, return reason, customer history. Tag returns with structured reason codes, not free-text.
Train the abandonment model first. It's simpler, the feedback loop is faster (you see results in days, not weeks), and the revenue impact is immediate.
Add size recommendation once you have enough return data. You need at least a few hundred returns per product category to train meaningful size models.
Close the feedback loop. Every prevented abandonment and every prevented return feeds back into the model, making it more accurate over time. This is the compounding advantage of ML - it gets better the more data it sees.