The £35,000 Lesson: When AI Generated Code Meets Production

Over the past month, I have watched two separate projects descend into absolute chaos because third party vendors decided that AI generated code was good enough for production. The result? Over £35,000 in direct costs and counting, countless hours debugging someone else's hallucinated logic, and the motivation to build ai-code-detector, a tool that has gained serious traction in the last couple of days.

This is not some theoretical waffle about whether AI can write code. This is a proper post mortem of what happens when organisations treat AI generated code as a shortcut rather than a starting point. Buckle up, because it is not pretty.

Two Projects, Same Problem, Different Flavours of Disaster

I am currently working on two major ecommerce integrations: one for a US based jewellery retailer and one for a UK fashion platform. Both have third party vendors providing critical APIs and services. Both vendors, as it turns out, have decided that AI assisted development means "let the AI write everything and hope for the best."

The symptoms were different. The underlying disease was identical.

Project One: The US Jewellery Platform (Yes, I Am Talking To You)

This one is a scaling project. As I grow the integration, they need to grow with me. They are using what they call a "Rapid Development Framework" which, as far as I can tell, translates to "Claude or ChatGPT writes the code and we ship it without looking."

Here is what I have encountered over the past four weeks:

Nothing Is Tested. Literally Nothing.

I do not mean "the test coverage is a bit low." I mean there are zero tests. Zilch. Nada. The AI generated the implementation, someone glanced at it for about three seconds, and it went straight to production.

When I asked about their testing strategy, I got a response about how "AI generated code is inherently more reliable because it is based on patterns from millions of codebases." Mate, that is absolutely not how any of this works.

The practical impact? Every single integration I build has to account for the fact that their API might return completely unexpected data structures. I have seen product endpoints return prices as strings, integers, floats, and on one memorable occasion, a nested object with the price buried three levels deep. All from the same endpoint. All in the same bloody week.

Functionality Changes At Random

This is the telltale sign of AI generated code that is being continuously regenerated rather than properly maintained.

Monday: the /api/products/{id} endpoint returns { "name": "Diamond Ring", "price": 1299 }

Wednesday: the exact same endpoint returns { "product_name": "Diamond Ring", "price_cents": 129900 }

Friday: now it is { "title": "Diamond Ring", "pricing": { "amount": 1299, "currency": "USD" } }

This is not versioning. There is no deprecation notice. The schema just changes because someone asked the AI to "improve" the endpoint and it decided that meant completely restructuring the response.

My integration code now has to handle all three formats because I genuinely have no idea which one I will get on any given day. It is like playing API roulette except everyone loses.

The API Is Not Secure (And This Keeps Me Up At Night)

The authentication system has all the classic hallmarks of AI generated security code:

API keys are validated client side in some endpoints and server side in others. There is an endpoint that accepts a user_id parameter and returns that user's full order history, including payment details, with absolutely no authentication check whatsoever. The "admin" endpoints are protected by checking if the request header contains X-Admin: true. That is it. That is the entire security model.

I have flagged these issues multiple times. The response? "We will have the AI review the security implementation." The same AI that wrote the insecure code in the first place. Brilliant.

Endpoints Change Without Any Warning

Last Tuesday, half my integration broke because /api/v2/inventory became /api/inventory/v2. No redirect. No deprecation period. No notification. Just broken.

When I asked why, I was told they "refactored the routing for better organisation." The AI suggested it, apparently. Cheers for that.

I now have monitoring that checks endpoint availability every five minutes because I genuinely cannot trust that the URLs I am calling today will exist tomorrow.

Project Two: The UK Fashion Platform

This one is different. Scaling is not the immediate problem. The problem is that I cannot rely on any input or output because the underlying logic changes at random.

Same disease, different symptoms:

Logic That Changes Between Deploys

The discount calculation endpoint is my absolute favourite example. Over three weeks, I documented seven different discount calculation methods:

Week 1: Percentage off list price

Week 1 (later that same week): Percentage off after VAT

Week 2: Fixed amount off, rounded down

Week 2 (later): Fixed amount off, rounded to nearest penny

Week 3: Percentage off, but capped at £50

Week 3 (later): Percentage off, but minimum discount of £5

Week 3 (even later): Back to percentage off list price, but now VAT is calculated differently

Each change came with no documentation, no changelog, no notification. The code just changed. Because someone asked the AI to "fix the discount calculation" and it interpreted that as "rewrite from scratch based on what seems reasonable today."

Inconsistent Data Validation

The same input that worked yesterday throws a 400 error today. The same malformed input that should throw an error gets accepted and corrupts the database.

I have seen:

Email validation that accepts not-an-email on Monday but rejects [email protected] on Tuesday because someone added a regex that requires a two letter TLD

Price fields that accept negative numbers, leading to orders where the customer literally gets paid to buy products

Date parsing that interprets 01/02/2025 as January 2nd sometimes and February 1st other times, depending on which AI generated parsing function happens to be deployed that day

The Classic AI Code Smells

Both codebases exhibit the patterns that motivated me to build ai-code-detector:

Overly verbose comments that restate the blindingly obvious:

# This function calculates the total price by adding up all the item prices
def calculate_total_price(items):
    # Initialize the total to zero
    total = 0
    # Loop through each item in the items list
    for item in items:
        # Add the item's price to the total
        total += item.price
    # Return the calculated total
    return total

Cheers for that. I would never have figured out what calculate_total_price does without those comments.

Every function has a perfect docstring, even trivial ones:

def add(a, b):
    """
    Adds two numbers together.
    
    Args:
        a: The first number to add.
        b: The second number to add.
    
    Returns:
        The sum of a and b.
    
    Raises:
        TypeError: If a or b are not numeric types.
    """
    return a + b

Real developers do not write documentation like this for a two line function. They just do not.

Overdefensive error handling that catches everything and handles nothing:

try:
    result = process_order(order)
except ValueError:
    pass  # Handle value error appropriately
except TypeError:
    pass  # Handle type error appropriately  
except Exception:
    pass  # Handle any other errors appropriately

Ah yes, the classic "handle appropriately by doing absolutely nothing" pattern.

The TODO above working code:

# TODO: Implement order processing
def process_order(order):
    # Full implementation follows
    validate_order(order)
    calculate_totals(order)
    process_payment(order)
    send_confirmation(order)
    return order

The TODO says implement it. The code below implements it. Make it make sense.

The Real Cost: £35,000 And Counting

Let me break down what this has actually cost my clients over the past month:

Direct Development Time: £18,500

My time spent debugging, building workarounds, implementing defensive code, and handling production incidents caused by upstream changes. At commercial rates, this adds up terrifyingly fast.

Production Incidents: £8,200

Three significant outages caused by unexpected API changes. Each required emergency response, customer communication, and manual order processing while I scrambled to fix integrations that broke through absolutely no fault of my own.

Data Reconciliation: £4,800

The inconsistent discount calculations and price handling meant order totals in my system did not match order totals in theirs. Reconciling a month's worth of transactions took two people three full days.

Customer Compensation: £2,100

Orders that were priced incorrectly, deliveries that failed because address validation changed mid week, customers who received wrong products because SKU handling logic mutated randomly.

Security Audit (Ongoing): £1,400 so far

After discovering the authentication issues, I engaged a third party to assess exposure. That work is still ongoing.

Total: £35,000+ and climbing every single day

This does not include the opportunity cost of features I could not build because I was firefighting someone else's AI generated mess.

Why I Built ai-code-detector

After the third week of this absolute chaos, I needed a way to quickly assess whether new code drops from these vendors were likely to be AI generated. Not because AI generated code is inherently bad, but because AI generated code that has not been reviewed, tested, or understood by a human is a liability waiting to explode.

ai-code-detector is a heuristic tool. It looks for the tells:

Comment style (verbose, restating the obvious, hedging language like "you may want to consider")

Docstring saturation (every function perfectly documented, even trivial ones)

Naming patterns (AI loves its userAuthenticationToken where a human would write auth_tok)

Error handling (overdefensive patterns, empty except blocks with "handle appropriately" comments)

Structural uniformity (every function looks identical because they were all generated the same way)

Dead giveaways (TODO comments sitting right above fully working implementations)

It is not perfect. It cannot catch AI code that has been carefully edited by a human. Short snippets do not give it much to work with. But it has been genuinely useful for flagging code that needs extra scrutiny before I trust it anywhere near production.

The response has been mental. In two days, it has gained more attention than I ever expected. Turns out, I am absolutely not the only one dealing with this problem.

The Risks Nobody Is Talking About

Beyond the immediate costs, there are systemic risks that genuinely concern me:

Technical Debt Multiplication

Every time these vendors regenerate code with AI, they are not maintaining a codebase. They are generating a completely new one. There is no institutional knowledge. There is no understanding of why things are the way they are. When something breaks, the "fix" is to regenerate, which might fix the immediate issue but introduces new inconsistencies elsewhere.

Security As An Afterthought

AI models are trained on public code, which includes an absolute mountain of insecure code. When you ask an AI to "add authentication," it might give you something that looks like authentication but has fundamental flaws. Without human review by someone who actually understands security, these flaws go straight to production.

The Knowledge Gap

The teams maintaining these systems do not understand them. They cannot, because they did not write them. When something goes wrong, they cannot debug it. They can only regenerate and hope. This creates a dependency on the AI that becomes increasingly dangerous as the systems become more complex.

Vendor Lock In To Chaos

Once your systems are integrated with a vendor who operates this way, you are locked in to their chaos. Migrating away is expensive. Staying is expensive. There is no good option, just varying degrees of bad.

What Needs To Change

I am not anti AI. I use AI tools in my own development every single day. But there is a massive difference between using AI as a tool and using AI as a replacement for understanding.

For vendors:

AI generated code is a first draft, not a finished product. It needs review, testing, and most importantly, actual understanding. If the humans maintaining the code cannot explain why it works, it should not be anywhere near production.

For organisations integrating with third parties:

Ask about their development practices. Include code quality and stability requirements in contracts. Build defensive integrations that assume upstream instability. Monitor for unexpected changes constantly.

For the industry:

We need proper standards around AI assisted development. Not restrictions, but expectations. Code reviews are still necessary. Tests are still necessary. Documentation that explains intent (not just what the code does, but why it does it) is still necessary.

What Is Next

I am planning to expand ai-code-detector with some additional capabilities:

A web interface so you do not need to clone the repo and faff about in a terminal

GitHub integration to scan entire repositories

Git history analysis, because sudden changes in code style or commit patterns are themselves a massive tell

Per file reporting to identify which parts of a codebase are most suspicious

The goal is not to eliminate AI from development. It is to identify where AI generated code might need extra human attention before it causes the kind of disaster I have been dealing with.

In the meantime, I will be over here, maintaining defensive code against two different APIs that might change their behaviour at literally any moment, while watching the costs tick steadily upward.

£35,000 in one month. For code that nobody at those organisations actually understands.

Welcome to the future of software development. It is expensive.

The projects described are real. Identifying details have been changed or omitted. The costs are accurate as of the date of writing and continue to accumulate.

If you are dealing with similar issues, check out ai-code-detector. And if you are a vendor reading this: yes, I am talking to you. Please, for everyone's sake, have a human review your code before shipping it.