Skip to content
Book demo
Back to Guides

Spec-Driven Development for Brownfield Enterprise Codebases

Mar 19, 2026Last updated: Jun 2, 2026
Molisha Shah
Molisha Shah
Spec-Driven Development for Brownfield Enterprise Codebases

Spec-driven development in brownfield enterprise codebases is most effective when teams write change-level specifications rather than full-system specs, because undocumented dependencies and scale make comprehensive upfront specifications impractical.

TL;DR

Brownfield codebases break greenfield SDD assumptions because legacy behavior, dependencies, and contracts are rarely documented. The practical approach is to write specs only for the change being made, incrementally grow coverage, and verify against existing tests and production-observed behavior.

Martin Fowler's evaluation of SDD tools covers Kiro, spec-kit, and Tessl but never quantifies the effort to introduce them into existing codebases, the exact problem facing teams that maintain repositories with hundreds of thousands of files, 10-15 years of technical debt, and little surviving architectural documentation.

The problem is structural. SDD demos usually assume blank-slate requirements analysis, but brownfield systems already exist, their contracts are often implicit, and their dependencies are rarely fully documented. Teams need a workflow that begins with understanding the current system and writes narrow specs only for the intended change. The payoff is no longer hypothetical: Salesforce's engineering team cut a legacy migration it had estimated at two years down to four months by leading with dependency analysis rather than file-by-file rewriting.

Augment Cosmos is a unified cloud platform for running coordinated AI agents across the software development lifecycle, with shared context and memory that compound across a team. Built on the Context Engine, it gives teams working in large codebases the architectural understanding to map dependencies across 400,000+ files through semantic dependency graph analysis before they draft a change-level spec.

Cosmos keeps every agent working from a continuously updated model of your live codebase.

Explore Cosmos

Free tier available · VS Code extension · Takes 2 minutes

What Teams Need Before Starting Brownfield SDD

Brownfield SDD has a lower barrier to entry than greenfield approaches because it starts with an existing codebase rather than a blank page. Three conditions make adoption practical.

The first condition is a semantic analysis tool capable of building dependency maps across the existing repository. Manual code reading does not scale past a few hundred files, and AI discovery without a dedicated context layer produces incomplete maps and downstream specification gaps.

The second condition is at least one engineer who understands the existing architecture well enough to review and correct the dependency maps the tool produces, since tribal knowledge can be structured and captured but not fully automated away. The third condition is a baseline for two DORA metrics: current lead time for changes and change failure rate. Without that baseline, the team cannot tell whether SDD is improving merge speed and reducing regressions.

Why Spec-Driven Development Breaks in Brownfield Codebases

Spec-driven development breaks in brownfield environments because five interconnected failure modes invalidate the assumptions built into greenfield SDD workflows.

Comprehensive Specification Is Impractical at Scale

Enterprise codebases with 100,000+ files cannot be comprehensively specified without exceeding human review capacity and retrieval limits. InfoQ's analysis puts it directly: for large existing applications, generating full specifications either blows past context limits or produces specs too large to review. The fix is scope reduction, keeping granular specs closest to the area of change.

Tribal Knowledge Silos Block Specification Authoring

Thoughtworks documents the knowledge loss problem in their TW whitepaper: with no architecture decision records and little to no test coverage, incremental feature development has no safety net. The engineers who understood architectural intent have departed, and modern engineers are not trained on legacy technologies. Writing specifications requires understanding what the system does, and that understanding has evaporated. Salesforce ran into the same wall internally: engineers traced a two-year migration estimate largely to undocumented legacy patterns no current team member could fully explain.

Undocumented Dependencies Create Specification Blind Spots

Even dedicated boundary-enforcement tooling struggles to keep dependencies legible. GitHub Engineering describes the technical debt that builds up in large legacy codebases, and a Packwerk review found the debt from privacy checks still far from paid off, despite explicit tooling designed to enforce dependency boundaries.

Cosmos's Context Engine maps these cross-module relationships across the repository and narrows the blind spots that leave boundary specs incomplete during legacy modernization.

Implicit Behavioral Contracts Resist Formalization

Brownfield systems contain behavioral expectations between components that were never documented: shared timing assumptions, ordering dependencies, and undocumented error-handling behaviors. These implicit contracts must be discovered before they can be encoded in specs. InfoQ warns that gaps between specification and actual system behavior compound over time, resurfacing in different forms whenever code is regenerated based on an incomplete spec.

AI Performance Degrades in Unhealthy Code

Fowler cites Adam Tornhill's research showing LLMs produce a 30% higher defect risk in less-healthy code, and that the study's less-healthy code was nowhere near as bad as much legacy code is. Kent Beck sharpens the critique via Fowler's blog: writing whole specifications upfront assumes teams learn nothing during implementation that would change the spec. In a brownfield, every implementation reveals hidden coupling that can invalidate upfront specs.

Failure ModeGreenfield ImpactBrownfield Impact
Scale of specificationManageable: new system, defined scopeImpractical: 100K+ files exceed review capacity
Knowledge availabilityThe developer defines intent directlyTribal knowledge lost; original architects departed
Dependency visibilityDefined at design timeUndocumented; accumulated over 10-15 years
Behavioral contractsSpecified before implementationImplicit; must be reverse-engineered from production
AI code qualityClean code, lower defect riskHigher defect risk in unhealthy codebases

Five Steps Teams Can Use to Apply a Spec-Driven Workflow to Brownfield Codebases

The brownfield SDD workflow differs from greenfield in a fundamental way: teams build an architectural understanding of the existing code first, then write narrow specifications scoped to the intended change. The greenfield habit of writing a comprehensive spec and generating code from it does not survive contact with an undocumented legacy system.

Step 1: Build Semantic Understanding Across the Existing Codebase

Brownfield SDD begins with understanding the codebase before any specification authoring. For repositories spanning hundreds of thousands of files, this requires semantic dependency analysis that maps relationships between components, identifies architectural patterns, and surfaces implicit contracts.

Cosmos's Context Engine builds this architectural understanding across codebases spanning 400,000+ files, the foundation change-level specification depends on. Red Hat sequences legacy modernization the same way with its agent mesh, where reasoning agents handle dependency mapping and migration planning before any coding agent touches a file. Without this foundation, specifications are written against an incomplete model of the system, leading to the integration failures Fowler warned about: just because the windows are larger does not mean AI will properly capture everything inside them.

The RPI Loop formalizes this: an agent scans the codebase and produces a compact summary of only the relevant state without writing code, keeping research and implementation in separate phases to prevent context contamination.

Step 2: Scope Each Spec to the Change Itself

The second step is the paradigm shift that makes brownfield SDD viable: each specification covers only the delta of the intended change. Rather than retroactively documenting an entire legacy codebase, teams write narrow specs covering what the current change touches. Coverage grows organically with each modification, concentrated where it provides the most value: the modules under active development.

A change-level spec defines four elements:

  1. Current behavior: what the system does today, if discoverable from tests or production traffic
  2. Target behavior: the precise delta from the current state
  3. Invariants: what must not change in adjacent systems
  4. Scope boundary: what is explicitly excluded from this change

Cosmos anchors these narrow specs to dependency evidence from the live codebase through its shared context layer, rather than relying on tribal memory or documentation that may be years out of date.

Step 3: Decompose Against Existing Architecture

Teams must decompose the change-level spec into implementation tasks that respect the existing architectural structure. Decomposition in brownfield differs from greenfield because the architecture already exists, constraints are real, and deviations from established patterns create a maintenance burden.

Stripe Minions validates this at enterprise scale: guidance is applied at a scoped or subdirectory level to avoid a global rules file that would exceed the model's context.

Step 4: Execute in Isolated Worktrees

Implementation tasks execute in parallel using Git worktrees, which give each agent its own directory, branch, and filesystem state while sharing the underlying repository data.

Boris Cherny, creator and head of Claude Code at Anthropic, describes running many parallel Claude Code sessions and using separate git checkouts for each local session when working on large batch changes, such as codebase-wide migrations. Google patterns further formalize this by assigning specific roles to individual agents, creating systems that are more modular, testable, and reliable.

Red Hat's agent mesh applies the same principle to legacy estates: specialized agents each own a slice of the migration and share state as the work proceeds, which lets a small team oversee a modernization that would take years by hand. Cosmos provisions isolated cloud Environments for this kind of parallel execution. They maintain architectural context across concurrent work streams and stay durable across long-running runs. Resource considerations are real: multiple concurrent worktrees multiply disk usage and can require per-worktree database instances and worktree-indexed Docker volume names at enterprise scale.

Step 5: Verify Against Spec and Existing Tests

Verification in brownfield serves two purposes: confirming the change matches the spec, and confirming it does not break existing behavior. That second purpose, protecting behavior that already exists, is what separates brownfield from greenfield. Compare the implementation against both the change-level spec and the existing test suite. The dual check catches the technical debt injection Fowler's team identified: AI-generated code that adds unrequested features and must integrate with systems the AI does not fully understand.

A machine-checkable contract validation step further strengthens verification. The pattern below uses an OpenAPI contract as an example of turning a prose spec into a CI-enforceable artifact:

text
# openapi.yaml
openapi: 3.1.0
info:
title: Payment Validation API
version: 1.0.0
paths:
/payments/validate:
post:
summary: Queue payment for async fraud validation
responses:
'202':
description: Accepted for async validation

Runnable validation: Redocly CLI, the maintained replacement for the deprecated swagger-cli, validates this contract and fails fast if required OpenAPI fields are missing.

text
npx @redocly/cli lint openapi.yaml
# Exits 0 when the description is valid, non-zero when it is not.

Common failure mode: if responses are omitted, validation fails with a schema validation error identifying the offending path. In brownfield teams, this turns a spec from informal prose into a machine-checkable contract that can run in CI before implementation merges.

Cosmos connects verification back to the same architectural map used during discovery, drawing on shared context and tenant memory to confirm that the implemented delta stayed within the intended boundary.

Cosmos runs parallel agents across your brownfield codebase without losing architectural context.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

Three Specification Patterns for Brownfield Codebases

Brownfield specification patterns differ from greenfield because they acknowledge the reality IEEE documented decades ago: legacy code itself is often the only reliable documentation.

Pattern 1: Change Specs (Delta Specifications)

Change specs capture only the behavioral delta of the intended modification. Every bug fix, feature addition, and refactoring becomes an opportunity to add specifications for the code being touched. The discipline requirement from InfoQ: every AI-assisted change must update the spec alongside the code. Skipping that update widens the specification gap, which resurfaces as non-deterministic AI generation failures later.

Pattern 2: Dependency Boundary Specs (Service Contract Specifications)

Dependency boundary specs formalize implicit contracts at integration points between legacy and modern systems, the same contracts that come under pressure during a monolith-to-microservices migration. Required components include machine-readable artifacts such as OpenAPI for REST and Avro or Protobuf for events, plus non-functional concerns such as failure modes, SLOs, and versioning, all tied to a shared vocabulary across teams.

Anti-corruption layers implement these boundary specs. Fowler describes a Backend for Frontend as an Anti-Corruption Layer that holds the frontend's domain model while translating to legacy interfaces. GitHub's SAML hardening shows the pattern in practice: bootstrap schemas from production traffic, A/B test with the Scientist framework, and converge on a minimal schema validated against millions of requests.

Pattern 3: Migration specs (Incremental Modernization Specifications)

Migration specs define the target state and the incremental steps to reach it from the current state, and are designed to be executed without stopping feature delivery. Three components are required per CircleCI analysis:

  1. Target state vision: specific enough to validate intermediate steps
  2. Incremental steps: each is individually deployable and independently valuable
  3. Integration layer design: the facade that mediates between old and new during transition
Open source
augmentcode/auggie233
Star on GitHub

Shopify's implementation of the Strangler Fig pattern validates this approach: build a facade, identify independent, extractable modules, migrate incrementally, and continuously monitor. Salesforce's engineering team executed this pattern when it modernized a third-party managed package into native multi-tenant Java, ordering the work by dependency graph and shipping in stages rather than attempting a single rewrite. Peer-reviewed research confirms that direct rewrites are rarely feasible in enterprise environments due to risks of functional regression and loss of institutional domain knowledge.

PatternScopeWhen to UseKey Discipline
Change specSingle modification deltaBug fixes, feature additions, refactoringUpdate spec with every AI-assisted change
Dependency boundary specIntegration point contractService extraction, monolith decompositionValidate against production traffic, not docs
Migration specMulti-phase architectural changeSystem modernization, database migrationEach step must be independently deployable

What Does Not Work: Full-Pipeline SDD for Brownfield Changes

AWS Kiro's mandatory three-phase pipeline creates structural friction for brownfield codebases. Kiro's own product team acknowledged this: not everyone starts from requirements, especially when working on existing brownfield apps where the technical architecture is already mapped out.

Three limitations make full-pipeline SDD impractical for routine brownfield changes. First, spec generation and full agent hook execution add per-task overhead that a single-line bug fix cannot justify. Second, the agent starts from scratch each session, relearning the codebase every time. Third, AWS's own case study demonstrated the approach only on a small codebase.

Cosmos addresses these limitations directly. Shared context and tenant memory persist architectural understanding across sessions, so agents do not relearn the codebase on every task. For teams working on incremental legacy changes, that continuity beats forcing every task through a fresh requirements-first pipeline.

Full-pipeline SDD still earns its place for large greenfield features inside brownfield codebases: a new service, a new API surface, a new subsystem. The distinction is between specifying new work, where upfront specification adds value, and modifying existing work, where change-level specs fit better.

How Teams Should Measure Brownfield SDD Effectiveness

DORA has acknowledged the topic but published no SDD-specific findings, and peer-reviewed literature offers no standardized metrics yet. Teams adopting brownfield SDD in 2026 are establishing baselines rather than following mature standards. Four metrics adapted from adjacent research provide a starting framework.

Drift Rate

InfoQ identifies drift as the natural state that must be continuously governed: divergence between specs and actual system behavior over time. Teams can instrument it immediately through schema validation failures per sprint, contract-test failures that signal implementation divergence, and spec revision frequency.

Regression Rate

Defect density in specced versus unspecced areas of the codebase provides the clearest signal. A long-cited baseline from software-quality literature is roughly 1 defect per 1,000 lines, though real rates vary widely by language, codebase, and measurement method. AI-assisted changes with formal specs should trend below it, though rigorous brownfield before-and-after data has not been published.

Time-to-Merge vs. Baseline

DORA Lead Time for Changes is the closest proxy. Establish a baseline before SDD adoption, then track quarterly.

Performance ClusterLead Time for ChangesChange Failure Rate
EliteLess than one day~5%
HighOne day to one week~20%
MediumOne week to one month~10%
LowMore than one monthHighest of the four clusters

Source: 2024 DORA report. DORA derives these clusters from each year's survey through cluster analysis, so the thresholds shift annually. In 2024, the Medium cluster reported a lower change failure rate than the High cluster.

Specification Coverage Growth

Measure specifications added per sprint rather than total coverage percentage. Brownfield coverage starts near zero and grows slowly. Useful denominators include critical-path components with formal specs, API endpoints with machine-readable contracts, and active-development modules with change-level specs.

Adopt Change-Level Specs Before Your Next Legacy Refactor

The core tension in brownfield SDD is scope: comprehensive specifications are impractical at enterprise scale, yet unspecified AI-assisted changes introduce compounding drift. Change-level specs resolve it by scoping each spec to the delta of a single modification and growing coverage where it matters most.

The next concrete step is simple: on the next brownfield change, write a change-level spec defining current behavior, target behavior, invariants, and scope boundaries before generating code. Measure whether the resulting change merges faster and introduces fewer regressions than the team's baseline.

Cosmos maps your codebase, coordinates agents across the full change cycle, and carries context forward across every session.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

FAQs

Written by

Molisha Shah

Molisha Shah

Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.


Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.