Building Custom AI Recommendations in Solidus
Solidus is an open-source ecommerce framework built on Ruby on Rails. It powers stores like [MeUndies](https://meundies.com), [Ace & Tate](https://aceandtate.com), and [Bonobos](https://bonobos.com), and it's the foundation I build on for custom ecommerce projects across the DACH market. One thing Solidus doesn't give you out of the box: product recommendations. There's no "customers who bought this also bought" engine. No "recommended for you" section. No "frequently bought together" feature. You get a solid ecommerce core — products, orders, payments, shipping — but the intelligence layer is yours to build. That's actually a good thing. The generic recommendation plugins available for most platforms are mediocre at best. They use simple co-occurrence counting ("people who bought A also bought B") without understanding why, and they fall apart for stores with fewer than 10,000 orders or highly seasonal catalogues. Building your own means you understand what it does, you control the data, and you can tune it for your specific business. This post is a complete walkthrough. I'll cover three approaches to recommendation — collaborative filtering, content-based filtering, and a hybrid that combines both — with full Python ML scripts and the Solidus/Rails code that wires them into your store. I'll be honest about the limitations, walk through the cold start problem, show you how to A/B test different strategies, and flag the GDPR implications you need to handle. The goal: by the end, you'll have everything you need to add a genuinely useful "recommended for you" feature to any Solidus store. Not a toy demo. A production-ready system with proper fallbacks, monitoring, and the ability to improve over time. Let's build it.
The Three Approaches (and When Each One Works)
Before writing any code, you need to understand the three fundamental approaches to product recommendations and their trade-offs.
Collaborative filtering says: "Users who behaved similarly to you in the past liked these products." It doesn't care about product attributes — it only looks at patterns in user behaviour. If users who bought Product A also tend to buy Product B, and you just bought Product A, the system recommends Product B. This is the approach Amazon popularised and it's genuinely powerful. But it has a fatal weakness: it can't recommend anything to a user with no history (new user cold start) or recommend a product nobody has interacted with yet (new item cold start).
Content-based filtering says: "Based on the attributes of products you've liked, here are similar products." It looks at product descriptions, categories, tags, prices, and other metadata to find similarity. If you've been browsing blue cotton T-shirts, it recommends other blue cotton T-shirts. It works from day one for new users (as soon as they view one product) and for new products (as soon as they're catalogued). But it creates filter bubbles — it will never recommend something surprising or cross-category.
Hybrid combines both, typically weighting collaborative filtering higher when enough data exists and falling back to content-based when it doesn't. This is what you should build, but understanding the components individually is essential for debugging and tuning.
Step 1: Event Tracking in Solidus
Recommendations are only as good as the signals you feed them. Before building any ML model, you need to capture user interactions. Solidus doesn't track product views or search queries by default — only completed orders. We need to add event tracking.
module Recommendations
class EventTracker
# Event types with implicit rating weights
# Higher weight = stronger signal of interest
WEIGHTS = {
purchase: 5.0,
add_to_cart: 3.0,
wishlist_add: 2.5,
product_view: 1.0,
search_click: 1.5,
category_browse: 0.5
}.freeze
def track(user_or_session:, event_type:, product:, metadata: {})
InteractionEvent.create!(
user_id: user_or_session.is_a?(Spree::User) ?
user_or_session.id : nil,
session_id: user_or_session.is_a?(String) ?
user_or_session : nil,
event_type: event_type,
product_id: product.id,
weight: WEIGHTS.fetch(event_type, 1.0),
metadata: metadata,
created_at: Time.current
)
end
def interaction_matrix_export(since: 6.months.ago)
# Export for Python model training
InteractionEvent
.where('created_at > ?', since)
.where.not(user_id: nil)
.group(:user_id, :product_id)
.select(
'user_id',
'product_id',
'SUM(weight) as total_weight',
'COUNT(*) as interaction_count',
'MAX(created_at) as last_interaction'
)
.map do |row|
{
user_id: row.user_id,
product_id: row.product_id,
weight: row.total_weight,
interactions: row.interaction_count
}
end
end
end
end
Hook this into your Solidus controllers. Product views go in Spree::ProductsController, cart adds in the OrdersController populate action, and purchases are captured via an order_complete subscriber:
# app/controllers/spree/products_controller_decorator.rb
module Spree
module ProductsControllerDecorator
def show
super
Recommendations::EventTracker.new.track(
user_or_session: current_user_or_session,
event_type: :product_view,
product: @product,
metadata: {
referrer: request.referrer,
source: params[:source]
}
)
end
private
def current_user_or_session
spree_current_user || session.id.to_s
end
end
end
Spree::ProductsController.prepend(
Spree::ProductsControllerDecorator
)
For order completion, use a Solidus event subscriber:
# app/subscribers/recommendations/order_subscriber.rb
module Recommendations
class OrderSubscriber
include Omnes::Subscriber
handle :order_finalized, with: :track_purchases
def track_purchases(event)
order = event.payload[:order]
tracker = EventTracker.new
order.line_items.each do |item|
tracker.track(
user_or_session: order.user || order.number,
event_type: :purchase,
product: item.product,
metadata: {
quantity: item.quantity,
price: item.price.to_f,
order_id: order.id
}
)
end
end
end
end
Step 2: Collaborative Filtering with Python
Once you have interaction data (aim for at least 500-1,000 orders across 50+ products before training), you can build the collaborative filtering model. I use the implicit library which implements Alternating Least Squares (ALS) — the same algorithm Netflix used in their early recommendation system.
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from implicit.als import AlternatingLeastSquares
import json
import pickle
def train_collaborative_model(interactions_csv: str,
factors: int = 50,
iterations: int = 30,
regularization: float = 0.01):
"""
Train an ALS collaborative filtering model.
interactions_csv: Export from Solidus EventTracker
Columns: user_id, product_id, weight
"""
df = pd.read_csv(interactions_csv)
# Create user and item indices
user_ids = df['user_id'].unique()
product_ids = df['product_id'].unique()
user_to_idx = {uid: i for i, uid in enumerate(user_ids)}
product_to_idx = {pid: i for i, pid in enumerate(product_ids)}
idx_to_product = {i: pid for pid, i in product_to_idx.items()}
idx_to_user = {i: uid for uid, i in user_to_idx.items()}
# Build sparse interaction matrix (users x products)
rows = df['user_id'].map(user_to_idx).values
cols = df['product_id'].map(product_to_idx).values
weights = df['weight'].values
interaction_matrix = csr_matrix(
(weights, (rows, cols)),
shape=(len(user_ids), len(product_ids))
)
# Train ALS model
model = AlternatingLeastSquares(
factors=factors,
iterations=iterations,
regularization=regularization,
use_gpu=False # Set True if you have CUDA
)
# implicit expects items x users, so transpose
model.fit(interaction_matrix.T)
print(f"Trained on {len(user_ids)} users, "
f"{len(product_ids)} products, "
f"{len(df)} interactions")
# Save model and mappings
with open('collab_model.pkl', 'wb') as f:
pickle.dump({
'model': model,
'user_to_idx': user_to_idx,
'product_to_idx': product_to_idx,
'idx_to_product': idx_to_product,
'interaction_matrix': interaction_matrix
}, f)
return model, user_to_idx, product_to_idx, idx_to_product
def get_user_recommendations(model_path: str,
user_id: int,
n: int = 10,
exclude_purchased: bool = True):
"""Get recommendations for a specific user."""
with open(model_path, 'rb') as f:
data = pickle.load(f)
model = data['model']
user_to_idx = data['user_to_idx']
idx_to_product = data['idx_to_product']
matrix = data['interaction_matrix']
if user_id not in user_to_idx:
return [] # Cold start: user not in training data
user_idx = user_to_idx[user_id]
# Get recommendations
# filter_already_liked_items removes products user interacted with
product_indices, scores = model.recommend(
user_idx,
matrix[user_idx],
N=n,
filter_already_liked_items=exclude_purchased
)
recommendations = []
for idx, score in zip(product_indices, scores):
recommendations.append({
'product_id': int(idx_to_product[idx]),
'score': float(score),
'method': 'collaborative_filtering'
})
return recommendations
def get_similar_products(model_path: str,
product_id: int,
n: int = 10):
"""Find products similar to a given product (item-based CF)."""
with open(model_path, 'rb') as f:
data = pickle.load(f)
model = data['model']
product_to_idx = data['product_to_idx']
idx_to_product = data['idx_to_product']
if product_id not in product_to_idx:
return [] # Cold start: product not in training data
item_idx = product_to_idx[product_id]
similar_indices, scores = model.similar_items(
item_idx, N=n + 1 # +1 because it returns itself
)
results = []
for idx, score in zip(similar_indices, scores):
pid = int(idx_to_product[idx])
if pid != product_id: # Skip itself
results.append({
'product_id': pid,
'score': float(score),
'method': 'item_similarity'
})
return results[:n]
Why ALS and not SVD or deep learning? ALS works well with implicit feedback (views, clicks, purchases) rather than explicit ratings (1-5 stars). Ecommerce data is almost entirely implicit — people don't rate products, they buy or don't buy. ALS also handles sparse matrices well, which is critical because any given user has interacted with only a tiny fraction of your catalogue. Deep learning approaches (like neural collaborative filtering) can outperform ALS but require significantly more data and compute — overkill for most Solidus stores.
Step 3: Content-Based Filtering with Python
Content-based filtering uses product attributes to find similar items. It's the fallback for cold start situations and a valuable signal in its own right:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
def train_content_model(products_csv: str):
"""
Build a content-based similarity model.
products_csv: Export from Solidus products table
Columns: product_id, name, description, category,
taxon_path, brand, price, tags
"""
df = pd.read_csv(products_csv)
# Combine text features into a single document per product
df['combined_text'] = (
df['name'].fillna('') + ' ' +
df['description'].fillna('') + ' ' +
df['category'].fillna('') + ' ' +
df['taxon_path'].fillna('').str.replace('>', ' ') + ' ' +
df['brand'].fillna('') + ' ' +
df['tags'].fillna('').str.replace(',', ' ')
)
# TF-IDF vectorisation
tfidf = TfidfVectorizer(
max_features=5000,
stop_words='english',
ngram_range=(1, 2), # Unigrams and bigrams
min_df=2, # Ignore very rare terms
max_df=0.95 # Ignore very common terms
)
tfidf_matrix = tfidf.fit_transform(df['combined_text'])
# Compute cosine similarity between all products
# For large catalogues (10k+), compute on-demand instead
if len(df) < 10000:
similarity_matrix = cosine_similarity(tfidf_matrix)
else:
similarity_matrix = None # Too large to precompute
product_ids = df['product_id'].values
product_to_idx = {pid: i for i, pid in enumerate(product_ids)}
model_data = {
'tfidf': tfidf,
'tfidf_matrix': tfidf_matrix,
'similarity_matrix': similarity_matrix,
'product_ids': product_ids,
'product_to_idx': product_to_idx
}
with open('content_model.pkl', 'wb') as f:
pickle.dump(model_data, f)
print(f"Content model built for {len(df)} products, "
f"{tfidf_matrix.shape[1]} features")
return model_data
def get_content_recommendations(model_path: str,
product_id: int,
n: int = 10):
"""Get content-similar products."""
with open(model_path, 'rb') as f:
data = pickle.load(f)
product_to_idx = data['product_to_idx']
if product_id not in product_to_idx:
return []
idx = product_to_idx[product_id]
if data['similarity_matrix'] is not None:
# Use precomputed similarity
similarities = data['similarity_matrix'][idx]
else:
# Compute on-demand for large catalogues
product_vector = data['tfidf_matrix'][idx]
similarities = cosine_similarity(
product_vector, data['tfidf_matrix']
).flatten()
# Get top-n similar (excluding self)
similar_indices = similarities.argsort()[::-1][1:n+1]
results = []
for sim_idx in similar_indices:
results.append({
'product_id': int(data['product_ids'][sim_idx]),
'score': float(similarities[sim_idx]),
'method': 'content_similarity'
})
return results
def get_content_recs_for_user(model_path: str,
viewed_product_ids: list,
n: int = 10):
"""
Recommend based on a user's browsing history.
Useful for users with views but no purchases.
"""
all_recs = {}
for pid in viewed_product_ids[-10:]: # Last 10 views
recs = get_content_recommendations(model_path, pid, n=5)
for rec in recs:
rid = rec['product_id']
if rid not in viewed_product_ids:
if rid in all_recs:
all_recs[rid]['score'] += rec['score']
all_recs[rid]['from_products'].append(pid)
else:
all_recs[rid] = {
**rec,
'from_products': [pid]
}
# Sort by accumulated score and return top-n
sorted_recs = sorted(
all_recs.values(),
key=lambda x: x['score'],
reverse=True
)[:n]
return sorted_recs
Limitation: Content-based filtering is only as good as your product data. If your Solidus store has product descriptions like "Blue T-Shirt. Made of cotton. Machine washable." — the model can't distinguish between products very well. The richer your text data (styling notes, materials, occasions, fit descriptions), the better the recommendations. This connects directly to the GEO optimisation work — enriching product data for AI assistants also makes your content-based recommendations better.
Step 4: The Hybrid Combiner
The hybrid approach weights and merges results from both models:
def hybrid_recommendations(user_id: int,
viewed_products: list,
collab_model_path: str,
content_model_path: str,
n: int = 10,
collab_weight: float = 0.6,
content_weight: float = 0.4):
"""
Combine collaborative and content-based recommendations.
Weights shift based on data availability.
"""
collab_recs = get_user_recommendations(
collab_model_path, user_id, n=n*2
)
content_recs = get_content_recs_for_user(
content_model_path, viewed_products, n=n*2
)
# If no collaborative data, shift weight to content
if not collab_recs:
collab_weight = 0.0
content_weight = 1.0
elif len(collab_recs) < 5:
collab_weight = 0.3
content_weight = 0.7
# Normalise scores within each method
collab_recs = normalise_scores(collab_recs)
content_recs = normalise_scores(content_recs)
# Merge
combined = {}
for rec in collab_recs:
pid = rec['product_id']
combined[pid] = {
'product_id': pid,
'score': rec['score'] * collab_weight,
'methods': ['collaborative']
}
for rec in content_recs:
pid = rec['product_id']
if pid in combined:
combined[pid]['score'] += rec['score'] * content_weight
combined[pid]['methods'].append('content')
else:
combined[pid] = {
'product_id': pid,
'score': rec['score'] * content_weight,
'methods': ['content']
}
results = sorted(
combined.values(),
key=lambda x: x['score'],
reverse=True
)[:n]
for rec in results:
rec['method'] = 'hybrid'
return results
def normalise_scores(recs: list) -> list:
"""Normalise scores to 0-1 range."""
if not recs:
return recs
max_score = max(r['score'] for r in recs)
if max_score == 0:
return recs
for r in recs:
r['score'] = r['score'] / max_score
return recs
Step 5: The Solidus Recommendation Engine
Now bring it all together in Ruby. The engine orchestrates model selection, handles cold start fallbacks, manages caching, and exposes a clean interface for views:
module Recommendations
class Engine
STRATEGIES = {
product_page: :similar_products,
cart_page: :complementary_products,
homepage: :personalised,
email: :personalised,
category_page: :category_popular
}.freeze
def for_user(user:, context:, limit: 8)
strategy = STRATEGIES.fetch(context, :personalised)
recs = cache_fetch(user, context) do
case strategy
when :personalised
personalised_recommendations(user, limit)
when :category_popular
category_popular(user, limit)
else
popularity_fallback(limit)
end
end
tag_for_tracking(recs, strategy)
end
def for_product(product:, user: nil, limit: 8)
recs = cache_fetch(product, :similar) do
similar = PythonBridge.similar_products(
product_id: product.id, n: limit
)
if similar.length < limit / 2
# Not enough collaborative data, add content-based
content = PythonBridge.content_recommendations(
product_id: product.id, n: limit
)
merge_and_deduplicate(similar, content, limit)
else
similar
end
end
# Boost if we know the user
if user
recs = personalise_product_recs(recs, user)
end
tag_for_tracking(recs, :similar_products)
end
def for_cart(order:, limit: 8)
product_ids = order.line_items.map(&:product_id)
recs = cache_fetch(order, :cart) do
complementary = []
product_ids.each do |pid|
similar = PythonBridge.similar_products(
product_id: pid, n: 4
)
complementary += similar
end
# Remove products already in cart, deduplicate
complementary
.reject { |r| product_ids.include?(r['product_id']) }
.uniq { |r| r['product_id'] }
.sort_by { |r| -r['score'] }
.first(limit)
end
tag_for_tracking(recs, :cart_complementary)
end
private
def personalised_recommendations(user, limit)
# Try hybrid first
viewed = recent_views(user)
hybrid = PythonBridge.hybrid_recommendations(
user_id: user.id,
viewed_products: viewed,
n: limit
)
return hybrid if hybrid.length >= limit / 2
# Fall back through cold start hierarchy
cold_start_fallback(user, viewed, limit)
end
def cold_start_fallback(user, viewed, limit)
# Level 1: Content-based from browsing history
if viewed.any?
content = PythonBridge.content_recs_for_user(
viewed_products: viewed, n: limit
)
return content if content.length >= limit / 2
end
# Level 2: Category-aware popularity
if user.orders.complete.any?
return category_popular(user, limit)
end
# Level 3: Global popularity
popularity_fallback(limit)
end
def popularity_fallback(limit)
# Most purchased products in last 30 days
Spree::Product
.joins(variants: { line_items: :order })
.where(orders: { state: 'complete',
completed_at: 30.days.ago.. })
.group('spree_products.id')
.order('COUNT(spree_line_items.id) DESC')
.limit(limit)
.pluck(:id)
.map do |pid|
{ 'product_id' => pid, 'score' => 0.0,
'method' => 'popularity' }
end
end
def cache_fetch(entity, context, ttl: 1.hour, &block)
key = "recs:#{entity.class.name}:#{entity.id}:#{context}"
Rails.cache.fetch(key, expires_in: ttl, &block)
end
def tag_for_tracking(recs, strategy)
recs.each do |rec|
rec['strategy'] = strategy.to_s
rec['model_version'] = current_model_version
rec['served_at'] = Time.current.iso8601
end
recs
end
end
end
Step 6: Rendering Recommendations in Views
Solidus uses standard Rails views (ERB or your templating engine of choice). Here's a view helper and partial for rendering recommendations:
module Recommendations
module ViewHelper
def product_recommendations(context:, limit: 8)
engine = Recommendations::Engine.new
recs = case context
when :product_page
engine.for_product(
product: @product,
user: spree_current_user
)
when :cart_page
engine.for_cart(order: current_order)
when :homepage, :email
return [] unless spree_current_user
engine.for_user(
user: spree_current_user,
context: context
)
end
return [] if recs.blank?
# Load actual Spree::Product objects
product_ids = recs.map { |r| r['product_id'] }
products = Spree::Product
.where(id: product_ids)
.includes(:master, images: { attachment_attachment: :blob })
.index_by(&:id)
recs.filter_map do |rec|
product = products[rec['product_id']]
next unless product&.available?
{
product: product,
score: rec['score'],
method: rec['method'],
strategy: rec['strategy'],
tracking_data: rec.slice(
'strategy', 'model_version', 'served_at'
)
}
end
end
end
end
The partial:
<%# app/views/spree/shared/_recommendations.html.erb %>
<% recommendations = product_recommendations(context: context) %>
<% if recommendations.any? %>
<section class="recommendations"
data-strategy="<%= recommendations.first[:strategy] %>"
data-model-version="<%= recommendations.first.dig(:tracking_data, 'model_version') %>">
<h2><%= title %></h2>
<div class="recommendation-grid">
<% recommendations.each do |rec| %>
<div class="recommendation-card"
data-product-id="<%= rec[:product].id %>"
data-method="<%= rec[:method] %>"
data-score="<%= rec[:score] %>">
<%= link_to spree.product_path(rec[:product],
source: 'recommendation',
rec_strategy: rec[:strategy]) do %>
<%= image_tag rec[:product].images.first&.url(:small),
alt: rec[:product].name %>
<span class="product-name"><%= rec[:product].name %></span>
<span class="product-price"><%= rec[:product].display_price %></span>
<% end %>
</div>
<% end %>
</div>
</section>
<% end %>
Step 7: A/B Testing Recommendations
You should never deploy recommendations without measuring their impact. Build A/B testing in from day one:
module Recommendations
class ABTest
EXPERIMENTS = {
homepage_recs: {
control: { strategy: :popularity, weight: 0.5 },
variant_a: { strategy: :collaborative, weight: 0.25 },
variant_b: { strategy: :hybrid, weight: 0.25 }
}
}.freeze
def assign_variant(user:, experiment:)
config = EXPERIMENTS[experiment]
return :control unless config
# Deterministic assignment based on user ID
# Ensures same user always sees same variant
hash = Digest::MD5.hexdigest(
"#{user.id}-#{experiment}"
).to_i(16)
cumulative = 0.0
config.each do |variant, settings|
cumulative += settings[:weight]
return variant if (hash % 1000) / 1000.0 < cumulative
end
:control
end
def track_conversion(user:, experiment:, variant:,
event:, value: nil)
ABTestEvent.create!(
user_id: user.id,
experiment: experiment,
variant: variant,
event: event, # :impression, :click, :add_to_cart, :purchase
value: value,
created_at: Time.current
)
end
def results(experiment:, since: 30.days.ago)
events = ABTestEvent
.where(experiment: experiment)
.where('created_at > ?', since)
.group(:variant, :event)
.count
# Calculate conversion rates per variant
EXPERIMENTS[experiment].keys.map do |variant|
impressions = events[[variant.to_s, 'impression']] || 0
clicks = events[[variant.to_s, 'click']] || 0
purchases = events[[variant.to_s, 'purchase']] || 0
{
variant: variant,
impressions: impressions,
click_rate: safe_divide(clicks, impressions),
purchase_rate: safe_divide(purchases, impressions),
clicks: clicks,
purchases: purchases
}
end
end
end
end
Step 8: Model Training Pipeline
Models need regular retraining as new data arrives. A Rake task run via cron:
namespace :recommendations do
desc 'Retrain recommendation models'
task retrain: :environment do
puts "Exporting interaction data..."
interactions = Recommendations::EventTracker.new
.interaction_matrix_export(since: 6.months.ago)
File.write(
'tmp/interactions.csv',
interactions.map(&:values).map { |r| r.join(',') }
.unshift('user_id,product_id,weight,interactions')
.join("\n")
)
puts "Exporting product data..."
Recommendations::ProductExporter.export_to_csv(
'tmp/products.csv'
)
puts "Training collaborative model..."
system('python3 train_collaborative.py tmp/interactions.csv')
puts "Training content model..."
system('python3 train_content.py tmp/products.csv')
puts "Invalidating cache..."
Rails.cache.delete_matched('recs:*')
puts "Updating model version..."
Rails.cache.write(
'recs:model_version',
Time.current.strftime('%Y%m%d%H%M')
)
puts "Done. Models retrained at #{Time.current}"
end
end
Run weekly for most stores. Daily if you have high order volume and rapidly changing inventory.
The Cold Start Problem (Honestly)
This is the single hardest problem in recommendation systems and I want to be direct about it.
New user, no history: The hybrid model has no collaborative data and no browsing history. You're flying blind. The fallback hierarchy is: (1) if they came from a specific category or search, recommend popular items in that category, (2) if they came from a campaign, use the campaign's product context, (3) otherwise, show globally popular products. None of these are personalised. They're just better than showing random products.
New product, no interactions: Collaborative filtering literally cannot recommend a product nobody has interacted with — it doesn't exist in the interaction matrix. Content-based filtering can, because it uses product attributes. This is why the hybrid matters: content-based recommendations give new products a chance to surface.
Small catalogue (under 100 products): Collaborative filtering needs diversity. If you have 50 products, the interaction matrix is too small for ALS to find meaningful patterns. Content-based filtering still works, but honestly, for very small catalogues, hand-curated recommendations might outperform ML. Don't use a machine learning cannon to crack a nut.
Seasonal catalogues: If your products change every season (fashion, for example), historical interaction data from last season's products is useless for this season's recommendations. Content-based filtering handles this better (new products get recommended by attribute similarity), but collaborative patterns ("this customer likes floral prints") need to be extracted at a higher level than individual product IDs.
Pitfalls and Limitations
The popularity bias trap. Collaborative filtering naturally recommends popular products more, because they appear in more interaction patterns. This creates a rich-get-richer effect where popular products get recommended more, get more sales, get recommended more. Actively monitor recommendation diversity and consider adding an exploration component that surfaces less-popular items.
The filter bubble. Content-based filtering only recommends products similar to what a user has already seen. A customer who buys running shoes will only ever see more running shoes — never the hiking boots they might love. The hybrid approach mitigates this, but doesn't eliminate it.
Latency. Calling a Python model per page load adds latency. The solution: pre-compute recommendations for active users on a schedule (e.g., nightly) and cache aggressively. Only compute in real-time for cart recommendations where the input (cart contents) changes frequently.
The "everyone who bought nappies also bought beer" problem. Collaborative filtering finds statistical correlations, not causal relationships. Sometimes the correlations are spurious or nonsensical. Always have a human review sample recommendations periodically.
Evaluation is hard. Offline metrics (precision, recall, NDCG) don't always correlate with online business metrics (click-through rate, add-to-cart rate, revenue per recommendation). The A/B testing framework above is essential — don't trust your model until you've measured its impact in production.
GDPR Considerations
Recommendation systems based on user behaviour constitute profiling under GDPR Article 22. For Solidus stores serving EU customers (which includes the entire DACH market), you need:
Consent. Personalised recommendations based on browsing history require user consent under the ePrivacy Directive (for tracking cookies) and potentially under GDPR (for profiling). Non-personalised recommendations (popularity-based) don't require specific consent. This connects directly to the privacy-first architecture and DACH compliance framework — your consent management platform needs a "personalisation" purpose that maps to recommendation tracking.
Transparency. Users have the right to know they're being profiled. A clear statement like "We personalise product recommendations based on your browsing and purchase history" in your privacy policy is the minimum. Consider a visible indicator on recommendation sections.
Right to object. Users must be able to opt out of profiling. In Solidus, this means a toggle in account settings that, when disabled, switches from personalised to popularity-based recommendations.
Data minimisation. Don't store interaction data forever. The 6-month training window in the export function above isn't just a technical choice — it's a data minimisation practice. Old interaction data should be aggregated (for model training) or deleted.
module Recommendations
class GDPRCompliance
def consent_granted?(user)
# Check consent management system
ConsentState.for_user(user).granted?(:personalisation)
end
def recommendations_for(user:, context:, limit: 8)
engine = Engine.new
if user && consent_granted?(user)
engine.for_user(user: user, context: context, limit: limit)
else
engine.popularity_fallback(limit)
end
end
def erase_user_data(user)
# Right to erasure: delete all interaction data
InteractionEvent.where(user_id: user.id).delete_all
ABTestEvent.where(user_id: user.id).delete_all
Rails.cache.delete_matched("recs:Spree::User:#{user.id}:*")
end
end
end
Deployment Architecture
For a typical Solidus store:
-
Python models live in a separate service (a Flask/FastAPI microservice) or are called via system commands from Ruby. For stores under 10,000 products, system commands work fine. For larger stores or real-time requirements, use a microservice with an HTTP API.
-
Model artefacts (the .pkl files) are stored on shared storage accessible to both the training pipeline and the serving layer. S3 or equivalent.
-
Caching uses Redis (which you likely already have for Sidekiq). Pre-computed recommendations per user, per product, and per category. Invalidated on model retrain and on significant user events (purchase completes).
-
Training runs as a scheduled job (Sidekiq-cron or system cron). Weekly for most stores, daily for high-volume.
-
Monitoring tracks: recommendation coverage (what percentage of product page views include recommendations), click-through rate, add-to-cart rate from recommendations, and model freshness (time since last retrain).
What I'd Build First
If you're starting from zero on a Solidus store:
-
Week 1: Deploy the event tracker. Start collecting interaction data. You can't build recommendations without data, and every day you wait is data you don't have.
-
Week 2: Build the content-based model. It works immediately with your existing product catalogue. Deploy "similar products" on product pages.
-
Week 4-6: Once you have a few hundred orders with tracking, train the collaborative filtering model. Deploy the hybrid on product pages and homepage.
-
Week 8: Add A/B testing. Compare hybrid recommendations against popularity-based. Measure revenue impact.
-
Ongoing: Retrain weekly. Monitor metrics. Tune the collaborative/content weight ratio based on A/B test results. Add cart recommendations. Add email recommendations.
Every Solidus store is different. A fashion store with 5,000 SKUs and high returns needs different tuning than a specialty food store with 200 products and high repeat purchase rates. The framework above handles both — you just tune the parameters, the training window, and the fallback logic for your specific situation.