TDD and BDD in Ruby on Rails: How to Ship Without Fear, and How AI Is Changing the Game

The Old Way: Code First, Pray Later

For decades, the dominant approach to building software went something like this. A product manager writes a specification. Developers read it (sometimes). Developers write code. Someone tests it, usually manually, usually under time pressure. Bugs get found. Bugs get fixed. New bugs get introduced by the fixes. More testing. More fixing. Eventually, someone says "good enough" and it ships.

The numbers tell the story of just how badly this approach fails. Poor software quality cost the United States alone an estimated $2.41 trillion in 2022 according to the Consortium for Information and Software Quality (CISQ). Developers typically introduce 100 to 150 errors per thousand lines of code. A quarter of all developers spend half their working time fixing bugs rather than building features. And here's the kicker: fixing a bug in production costs up to 100 times more than catching it during development.

Some of those bugs have been catastrophic. In 2012, Knight Capital Group lost $440 million in 45 minutes because of a faulty trading algorithm that made it to production. In 2015, an Airbus A400M crashed during a test flight, killing four crew members, because of software errors in the engine control system. In July 2024, a flawed CrowdStrike update caused a global IT outage affecting airlines, banks, and healthcare services.

These aren't edge cases. They're the logical consequence of a development methodology that treats testing as an afterthought.

Enter TDD: Write the Test First

Test Driven Development flips the script entirely. Instead of writing code and then testing it, you write the test first, watch it fail, write the minimum code needed to make it pass, and then refactor. This is the famous Red Green Refactor cycle.

Red: Write a test that describes the behaviour you want. Run it. It fails (because the code doesn't exist yet). This is your failing test, your red light.

Green: Write the simplest possible code that makes the test pass. Nothing more. Run the test again. It passes. Green light.

Refactor: Now clean up the code. Improve the implementation. Make it elegant. Run the tests again. Still green? Good. Move on.

This cycle is deliberately small. You're not writing an entire feature and then testing it. You're writing one tiny behaviour at a time. Each cycle might take five minutes. Over a day, you'll do dozens of these cycles, and at the end, you'll have a feature that works correctly, documented by a comprehensive test suite that proves it.

Here's what that looks like in practice with RSpec in a Rails application. Say I'm building a currency conversion service for Regios, which handles regional digital currencies.

# Step 1: RED - Write the test first
RSpec.describe CurrencyConverter do
  describe '#convert' do
    it 'converts EUR to Regios at the current exchange rate' do
      converter = CurrencyConverter.new(rate: 1.0)
      result = converter.convert(amount: 100.00, from: 'EUR', to: 'REGIOS')
      expect(result).to eq(100.00)
    end

    it 'rounds to exactly two decimal places' do
      converter = CurrencyConverter.new(rate: 1.0735)
      result = converter.convert(amount: 33.33, from: 'EUR', to: 'REGIOS')
      expect(result).to eq(35.78)
    end
  end
end

That test fails because CurrencyConverter doesn't exist yet. That's the point. Now I write the implementation:

# Step 2: GREEN - Write the minimum code to pass
class CurrencyConverter
  def initialize(rate:)
    @rate = rate
  end

  def convert(amount:, from:, to:)
    (amount * @rate).round(2)
  end
end

Tests pass. Green. Now I can refactor, add more tests for edge cases (negative amounts, zero rates, unsupported currencies), and build up the feature incrementally. And that rounding test? That's exactly the kind of test that would have caught the three decimal place bug I mentioned in the introduction. TDD doesn't just find bugs. It prevents them from existing in the first place.

BDD: Testing from the User's Perspective

Behaviour Driven Development takes TDD a step further. Where TDD focuses on testing individual units of code (does this method return the right value?), BDD focuses on testing system behaviour from the user's perspective (can the user complete this workflow successfully?).

BDD was created by Dan North in 2003 as a response to the confusion many developers felt about TDD. The terminology of TDD (tests, assertions, units) is developer centric. BDD reframes everything in terms of behaviour, using language that business stakeholders, product managers, and designers can actually understand.

The result is tests written in something close to plain English, using the Given/When/Then format. Here's what that looks like using Cucumber for Auto-Prammer.at, my automotive marketplace built on Rails and Solidus:

Feature: Vehicle listing search
  As a potential car buyer
  I want to search for vehicles by make, model, and price range
  So that I can find a car that fits my budget

  Scenario: Searching for a specific make and model
    Given there are 10 BMW 3 Series listings
    And there are 5 Audi A4 listings
    When I search for "BMW 3 Series"
    Then I should see 10 results
    And I should not see any Audi listings

  Scenario: Filtering by price range
    Given there is a BMW 320i listed at 25000 euros
    And there is a BMW 330i listed at 45000 euros
    When I search with a maximum price of 30000 euros
    Then I should see the BMW 320i
    And I should not see the BMW 330i

Read that out loud. A product manager can understand it. A designer can understand it. Your non technical co-founder can understand it. That's the power of BDD. The tests serve as living documentation of how the system is supposed to behave, written in a language everyone on the team can read.

Behind those Cucumber scenarios are step definitions that wire the plain English to actual test code:

Given('there are {int} BMW 3 Series listings') do |count|
  create_list(:vehicle_listing, count, make: 'BMW', model: '3 Series')
end

When('I search for {string}') do |query|
  visit vehicles_path
  fill_in 'Search', with: query
  click_button 'Find Vehicles'
end

Then('I should see {int} results') do |count|
  expect(page).to have_css('.vehicle-card', count: count)
end

RSpec vs Minitest vs Cucumber: The Rails Testing Trinity

Ruby on Rails has one of the richest testing ecosystems of any web framework. Rails was the first major framework to integrate a testing framework from day one. But which framework should you use? Let me compare the three main options.

RSpec is the dominant testing framework in the Rails world. According to a 2024 Stack Overflow survey, RSpec is preferred by over 59% of Ruby on Rails developers. It uses a descriptive, BDD inspired syntax with describe, context, and it blocks that read like specifications. RSpec is ideal for medium to large projects, for teams that practise BDD, and for codebases where test readability matters (which is all of them, in my opinion).

RSpec.describe Vehicle, type: :model do
  describe 'validations' do
    it { is_expected.to validate_presence_of(:make) }
    it { is_expected.to validate_presence_of(:model) }
    it { is_expected.to validate_numericality_of(:price).is_greater_than(0) }
  end

  describe '#full_title' do
    it 'returns the make and model combined' do
      vehicle = build(:vehicle, make: 'BMW', model: '320i', year: 2024)
      expect(vehicle.full_title).to eq('2024 BMW 320i')
    end
  end
end

Minitest ships with Ruby and Rails by default. It's lightweight, fast, and feels like writing pure Ruby code. There's no DSL to learn; if you know Ruby, you know Minitest. It supports both TDD style (assert_equal) and a spec style similar to RSpec. Minitest is ideal for smaller projects, for developers who value simplicity, and for situations where test speed is critical.

class VehicleTest < ActiveSupport::TestCase
  test 'full title returns make model and year' do
    vehicle = vehicles(:bmw_320i)
    assert_equal '2024 BMW 320i', vehicle.full_title
  end

  test 'price must be positive' do
    vehicle = Vehicle.new(price: -1000)
    assert_not vehicle.valid?
    assert_includes vehicle.errors[:price], 'must be greater than 0'
  end
end

Cucumber is the BDD tool that uses Gherkin syntax (Given/When/Then) for acceptance tests. It bridges the gap between technical and non technical stakeholders by writing tests in plain English. Cucumber is ideal for acceptance testing, for projects with non technical stakeholders who need to validate behaviour, and for documenting business requirements as executable specifications.

Here's my honest comparison:

Readability: RSpec wins. Its descriptive blocks create test files that read like documentation. Minitest is clean but terse. Cucumber is the most readable to non developers.

Speed: Minitest wins by a significant margin in raw benchmarks. RSpec carries more overhead. Cucumber is the slowest because it parses Gherkin and runs through step definitions. In practice, for most real world applications, the speed difference between RSpec and Minitest is marginal (often under 10%).

Ecosystem: RSpec wins. The ecosystem of matchers, plugins, and integrations (shoulda-matchers, factory_bot, capybara, vcr, webmock) is enormous. Minitest has fewer plugins but benefits from the Rails default integrations. Cucumber integrates with both RSpec and Minitest.

Learning curve: Minitest wins. It's just Ruby. RSpec introduces a DSL that takes time to learn. Cucumber adds Gherkin syntax on top.

My recommendation: Use RSpec for unit and integration tests, and Cucumber for critical user journey acceptance tests. That's the approach I use on both Auto-Prammer.at and the Regios SaaS platform. RSpec handles the bulk of the testing (model specs, controller specs, service specs, request specs), while Cucumber covers the end to end flows that matter most to the business.

Industries Where Bugs Are Not an Option

Not all bugs are created equal. A misaligned button on a marketing page is annoying. A rounding error in a financial transaction is a compliance violation. A software fault in a medical device can kill someone.

In certain industries, comprehensive testing isn't a nice to have. It's a legal and ethical requirement.

Fintech and banking. Financial software operates under strict regulations including PSD2, anti money laundering directives, and GDPR. Every transaction must be accurate to the penny. Every audit trail must be complete. Every security control must work. The Knight Capital disaster, where a bug caused $440 million in losses in under an hour, is the cautionary tale the entire industry lives by. This is exactly why my work with Regios on their digital currency platform involves exhaustive test suites covering every financial calculation, every transaction state, and every regulatory reporting pathway.

Healthcare. Medical software is regulated by frameworks like the EU Medical Device Regulation (MDR). Software that controls diagnostic equipment, manages patient records, or calculates drug dosages must be virtually defect free. Testing isn't just about functionality; it's about patient safety.

Aviation and aerospace. Avionics software follows standards like DO-178C, which requires formal verification and testing at multiple levels. The Airbus A400M crash in 2015 was traced to software configuration errors. In aviation, untested software literally kills people.

eCommerce. While not as life threatening, ecommerce bugs have massive financial implications. A broken checkout flow during a sale event can cost millions in lost revenue. Incorrect pricing calculations can create legal liability. And with the Cyber Resilience Act now requiring secure by design practices for products with digital elements, testing is becoming a regulatory obligation for ecommerce platforms too.

The Critical Path: What Must Be Tested Before Every Release

You can't test everything. Well, you can try, but you'll never ship. The art of effective testing is identifying the critical path: the set of functionalities that absolutely must work correctly for the application to be usable, secure, and compliant.

For any web application, the critical path typically includes:

Authentication and authorisation. Can users log in? Can they only access what they're supposed to? Are admin functions properly restricted? A failure here is a security breach.

Core business logic. Whatever your application actually does, the central calculations, workflows, and data transformations must be correct. For Regios, that's currency conversion and transaction processing. For Auto-Prammer.at, that's vehicle listing, search, and the checkout flow.

Payment processing. If money changes hands, it must be tested exhaustively. Every payment state (pending, processing, completed, failed, refunded), every edge case (network timeout during payment, duplicate submissions, partial refunds), every currency calculation.

Data integrity. Database constraints, validations, and data migration paths must be verified. If data can get corrupted, it will.

Regulatory compliance features. GDPR consent flows, data export/deletion, audit logging, anti fraud checks. If a regulator audits your system, these must work flawlessly.

API contracts. If your application exposes or consumes APIs, the contracts must be tested. A breaking change in an API can cascade through every system that depends on it.

Every one of these must be covered by automated tests that run before every single deployment. No exceptions. No "we'll test it manually this time." No "it worked on my machine." The CI/CD pipeline should be the gatekeeper: if any critical path test fails, the deployment stops.

The Regios Example: Testing a Fintech SaaS Under GDPR

Let me get specific about how this works in practice. Regios is a fintech client I'm working with that operates a regional digital currency platform in Austria. The platform handles real money, real transactions, and real personal data under some of the strictest privacy regulations in the world.

The SaaS platform uses modules from GrowCentric.ai, my upcoming SaaS product (launching publicly in June 2026), which provides the analytics and campaign optimisation engine. This means the testing strategy needs to cover not just the Regios specific business logic, but also the shared GrowCentric modules that power analytics, user segmentation, and marketing automation.

Here's what the critical test suite looks like:

GDPR compliance tests. These verify that personal data is handled correctly at every stage.

RSpec.describe 'GDPR Compliance', type: :request do
  describe 'data export' do
    it 'exports all personal data for a user in machine readable format' do
      user = create(:user, :with_transactions, :with_profile)
      get api_v1_user_data_export_path(user), headers: auth_headers(user)
      expect(response).to have_http_status(:ok)
      data = JSON.parse(response.body)
      expect(data).to include('personal_info', 'transactions', 'consent_records')
    end
  end

  describe 'right to deletion' do
    it 'anonymises user data while preserving transaction records' do
      user = create(:user, :with_transactions)
      delete api_v1_user_path(user), headers: auth_headers(admin)
      user.reload
      expect(user.email).to match(/anonymised/)
      expect(user.transactions).to all(have_attributes(user_id: user.id))
    end
  end

  describe 'consent management' do
    it 'records consent with timestamp and purpose' do
      user = create(:user)
      post api_v1_consents_path,
        params: { purpose: 'marketing', granted: true },
        headers: auth_headers(user)
      consent = user.consents.last
      expect(consent.purpose).to eq('marketing')
      expect(consent.granted_at).to be_within(1.second).of(Time.current)
    end
  end
end

Transaction integrity tests. These ensure every financial operation is accurate and atomic.

RSpec.describe TransactionService do
  describe '#process_transfer' do
    it 'debits sender and credits receiver atomically' do
      sender = create(:wallet, balance: 100.00)
      receiver = create(:wallet, balance: 50.00)

      TransactionService.new.process_transfer(
        from: sender, to: receiver, amount: 30.00
      )

      expect(sender.reload.balance).to eq(70.00)
      expect(receiver.reload.balance).to eq(80.00)
    end

    it 'rolls back both sides if the transfer fails midway' do
      sender = create(:wallet, balance: 100.00)
      receiver = create(:wallet, balance: 50.00)

      allow(receiver).to receive(:credit!).and_raise(ActiveRecord::RecordInvalid)

      expect {
        TransactionService.new.process_transfer(
          from: sender, to: receiver, amount: 30.00
        )
      }.to raise_error(ActiveRecord::RecordInvalid)

      expect(sender.reload.balance).to eq(100.00)
      expect(receiver.reload.balance).to eq(50.00)
    end

    it 'prevents negative balance transfers' do
      sender = create(:wallet, balance: 20.00)
      receiver = create(:wallet, balance: 50.00)

      expect {
        TransactionService.new.process_transfer(
          from: sender, to: receiver, amount: 50.00
        )
      }.to raise_error(InsufficientFundsError)
    end
  end
end

Audit trail tests. For regulatory compliance, every significant action must be logged.

RSpec.describe AuditLog do
  it 'logs every transaction with actor, action, and timestamp' do
    user = create(:user)
    expect {
      TransactionService.new.process_transfer(
        from: user.wallet, to: create(:wallet), amount: 10.00
      )
    }.to change(AuditLog, :count).by_at_least(1)

    log = AuditLog.last
    expect(log.actor_id).to eq(user.id)
    expect(log.action).to eq('transfer')
    expect(log.metadata).to include('amount' => '10.0')
  end
end

These tests run on every pull request, every merge to main, and before every deployment. The CI pipeline (GitHub Actions) blocks deployment if any test fails. There is no manual override. In fintech, you don't ship untested code. Ever.

The Auto-Prammer.at Example: Testing an eCommerce Platform on Solidus

The testing strategy for Auto-Prammer.at is different in character but equally rigorous. Auto-Prammer is a car rental, garage service, and dealership web application built on Ruby on Rails with Solidus for the ecommerce components.

The critical path here includes vehicle search and filtering, booking and reservation flows, payment processing, and the admin interface that dealers and garage operators use to manage their listings.

Here's a Cucumber scenario for the booking critical path:

Feature: Vehicle rental booking
  As a customer browsing auto-prammer.at
  I want to book a rental vehicle for specific dates
  So that I have confirmed transport when I need it

  Scenario: Successful rental booking
    Given I am a registered customer
    And there is a BMW 320i available for rental
    And the daily rate is 65 euros
    When I select the BMW 320i
    And I choose rental dates from 1 March to 5 March 2026
    Then I should see a total of 260 euros for 4 days
    When I proceed to checkout
    And I complete payment with a valid card
    Then I should receive a booking confirmation
    And the vehicle should be marked as unavailable for those dates

  Scenario: Preventing double bookings
    Given the BMW 320i is already booked from 1 March to 5 March 2026
    When another customer tries to book it for 3 March to 7 March 2026
    Then they should see a message that the vehicle is not available
    And no payment should be processed

And the RSpec unit tests that back up the business logic:

RSpec.describe RentalPricingService do
  describe '#calculate_total' do
    it 'calculates price based on daily rate and duration' do
      service = RentalPricingService.new(
        daily_rate: 65.00,
        start_date: Date.new(2026, 3, 1),
        end_date: Date.new(2026, 3, 5)
      )
      expect(service.calculate_total).to eq(260.00)
    end

    it 'applies weekend surcharge on Saturday and Sunday' do
      service = RentalPricingService.new(
        daily_rate: 65.00,
        start_date: Date.new(2026, 2, 27), # Friday
        end_date: Date.new(2026, 3, 2)     # Monday
      )
      # Fri: 65, Sat: 65*1.15, Sun: 65*1.15, Mon: 65 = 279.50
      expect(service.calculate_total).to eq(279.50)
    end

    it 'handles single day rentals' do
      service = RentalPricingService.new(
        daily_rate: 65.00,
        start_date: Date.new(2026, 3, 1),
        end_date: Date.new(2026, 3, 1)
      )
      expect(service.calculate_total).to eq(65.00)
    end
  end
end

Notice how the BDD scenario describes the full user journey while the RSpec tests drill into the precise calculations. They work together. The Cucumber test tells you whether the system behaves correctly from the user's perspective. The RSpec test tells you exactly where things go wrong if a calculation breaks.

Using AI to Supercharge Your Testing (Without Losing Control)

Now here's where things get really interesting. Writing tests is essential but it's also tedious. A comprehensive test suite for a Rails application can involve hundreds or thousands of individual test cases. Writing all of them by hand is time consuming, and there's always the risk of missing edge cases that a human wouldn't think of.

This is where AI tools like Claude genuinely shine. But, and this is a critical but, there's a right way and a wrong way to use AI for testing.

The wrong way: "Hey AI, write tests for my application."

If you paste your code into Claude and say "write tests for this," you'll get tests. They might even pass. But they'll likely be superficial, miss domain specific edge cases, test implementation details rather than behaviour, and give you a false sense of security. AI generated tests that you haven't reviewed are worse than no tests at all, because they make you think you're covered when you're not.

Here's why this is dangerous. AI doesn't understand your business domain. It doesn't know that Regios needs to handle the edge case where a user tries to convert Regios back to EUR during a weekend when the exchange rate service is down. It doesn't know that Auto-Prammer needs to handle the scenario where a vehicle's MOT expires during a rental period. It generates tests based on code patterns, not business knowledge.

The right way: human directed, AI assisted testing.

The approach I've developed over the past year works like this:

Step 1: Think through the test strategy yourself. Before involving AI, decide what needs to be tested and why. What are the critical paths? What are the business rules? What are the regulatory requirements? This is human work. No AI can replace your understanding of the domain.

Step 2: Use AI to discover edge cases you haven't considered. This is where Claude is genuinely brilliant. Present your business logic and ask: "What edge cases should I consider for this currency conversion service?" Claude will suggest scenarios like handling floating point precision errors, dealing with rates that change during a transaction, handling concurrent conversions, timezone issues with rate snapshots, and more. Many of these you would have thought of eventually. Some you wouldn't have.

Here's a real example from my GrowCentric.ai work:

Prompt to Claude:
"I have a campaign budget allocation service that distributes
a daily marketing budget across multiple campaigns based on
performance scores. What edge cases should my test suite cover?"

Claude's suggestions (summarised):
- All campaigns have identical performance scores (equal split)
- One campaign has zero performance score
- All campaigns have zero performance scores
- Budget is less than number of campaigns (penny allocation)
- Budget amount is zero
- Negative performance scores (should they be normalised?)
- Performance scores change during allocation
- Rounding means allocated amounts don't sum to exact budget
- Single campaign scenario
- Maximum number of campaigns (performance at scale)
- Currency precision (should budget be in cents internally?)

That list contains at least three edge cases (the rounding sum discrepancy, the penny allocation problem, and the internal cents representation) that I hadn't explicitly planned to test. That's the value.

Step 3: Use AI to generate test scaffolding. Once you know what to test, use Claude to generate the boilerplate. Give it your test style, your factory definitions, your naming conventions, and ask it to scaffold the test cases. Review every single one. Modify them. Delete the ones that don't make sense. Add the domain specific assertions that only you understand.

Step 4: Never trust, always verify. Every AI generated test should be treated as a first draft. Run it. Read it line by line. Ask yourself: does this test actually verify what I think it verifies? Is it testing behaviour or implementation? Could this test pass even if the code was broken? Would this test fail if the code was broken?

I have a personal rule: if I can't explain exactly what a test does and why it matters in one sentence, it gets rewritten or deleted.

Step 5: Use AI for test review and improvement. After writing your tests (whether manually or AI assisted), paste them back into Claude and ask: "Are there any logical flaws in these tests? Are any tests redundant? Are any assertions too loose?" This second pass catches subtle issues like tests that pass for the wrong reasons, or assertions that would pass even with incorrect output.

The Best Practice Framework: AI Assisted Testing Done Right

Let me distil all of this into a practical framework.

You own the strategy. The human decides what to test, why to test it, and what constitutes a passing test. AI never makes these decisions.

AI discovers the unknowns. Use AI to brainstorm edge cases, suggest boundary conditions, and identify scenarios you might have overlooked. This is AI as a thinking partner, not a replacement for thinking.

AI generates the scaffolding. Let AI write the boilerplate, the factory definitions, the repetitive setup code. This saves enormous amounts of time without sacrificing control.

You review everything. Every test gets read, understood, and approved by a human. No test enters the suite that a human hasn't verified.

AI provides a second opinion. After writing and reviewing your tests, use AI to critique them. Ask for logical flaws, missed edge cases, and redundancies. This is AI as a code reviewer.

The suite is your responsibility. When a test fails, you need to understand why. When a bug slips through, you need to understand what test was missing. AI can help you write tests, but the test suite is your product, your responsibility, and your safety net.

This framework has proven itself on both Auto-Prammer.at and the Regios SaaS. For Auto-Prammer, using this approach, I achieved comprehensive test coverage across the vehicle search, booking, and payment flows in roughly half the time it would have taken to write every test manually. More importantly, the AI assisted edge case discovery caught three scenarios that would have been real production bugs: a date calculation error when bookings spanned daylight saving time changes, a race condition in the availability checking for simultaneous bookings, and a Solidus checkout flow edge case where applying a discount code after adding a service package produced an incorrect total.

For the Regios SaaS, the AI assisted approach was particularly valuable for the GDPR compliance tests. The regulations are complex, and Claude's ability to reason about data handling requirements helped me construct a more thorough consent management test suite than I would have built on my own. But every single test was reviewed against the actual regulatory text before being committed.

Wrapping Up: Testing Is Not a Tax, It's a Superpower

I've been building software for long enough to remember when testing was considered optional. When "it works on my machine" was an acceptable deployment criteria. When shipping bugs was just part of the process and hotfixes were the norm.

That era is over. The stakes are too high, the regulations are too strict, and the tools are too good for anyone to justify shipping untested code. TDD and BDD aren't just methodologies. They're a fundamentally different way of thinking about software. You don't write code and hope it works. You define exactly what "works" means, write a test that proves it, and then write code that passes.

The testing ecosystem in Ruby on Rails, with RSpec, Minitest, and Cucumber, gives you everything you need to implement this at every level of your application. And AI tools like Claude make the most tedious parts of testing dramatically faster, without sacrificing the human judgement that makes tests genuinely valuable.

Whether you're building a fintech platform under GDPR scrutiny like Regios, an automotive marketplace on Solidus like Auto-Prammer.at, or a SaaS product like GrowCentric.ai that's heading for public launch, the investment in testing pays for itself many times over. Fewer bugs, faster development, more confidence, and the ability to ship whenever you're ready, knowing that your test suite has your back.

And if you need help building that test suite, setting up your CI/CD pipeline, or implementing TDD and BDD practices in your Rails project, that's exactly what I do. Let's talk.