Skip to content
Book demo
Back to Guides

How to Write Living Specs for AI Agent Development

Mar 19, 2026Last updated: Jun 16, 2026
Molisha Shah
Molisha Shah
How to Write Living Specs for AI Agent Development

Living specs are the most reliable way to guide AI agent development because implementation changes flow back into the specification and prevent spec drift.

TL;DR

AI-generated code drifts quickly when teams keep regenerating it based on stale requirements. Static specs fail because they only move information one way. Living specs reduce that gap by writing implementation decisions back into the spec, keeping requirements, constraints, and code aligned across repeated development cycles.

From Static Specs to Living Specs

Engineering teams using AI coding agents quickly run into the same failure mode: the markdown says one thing, the repository says another, and the next regeneration cycle amplifies the mismatch. InfoQ research frames the underlying problem: a code issue reflects a gap in the specification, and because AI generation is non-deterministic, that gap keeps resurfacing as code is regenerated. That makes static specs fragile in workflows that expect repeated regeneration, review, or multi-step handoffs.

Living specs change the direction of information flow. Instead of treating the spec as a one-time prompt, teams treat it as an evolving artifact that captures requirements, constraints, and implementation decisions. ThoughtWorks, Anthropic, and Addy Osmani each emphasize a similar operating principle: write enough structure for the agent to act correctly, then update it as reality changes.

For teams evaluating orchestration tools, some vendors now describe living specs as workflow infrastructure rather than just documentation. Augment Cosmos, Augment Code's unified cloud agents platform with shared context and memory that compounds across the team and the software development lifecycle, is one example. Cosmos uses repository context from its Context Engine to coordinate spec-driven work across agents, with a human checkpoint to review the spec before agents execute. This guide explains what living specs are, how to structure them, where teams overspecify, and how to review agent-updated specs without creating documentation debt.

Why Static Specs Fail AI Agent Workflows

Static specs fail AI agent workflows because they only move information in one direction, which lets implementation drift compound across regeneration cycles.

The root cause is directional. Static specs flow one way: a developer writes requirements, an agent consumes them, and the spec remains unchanged while the codebase evolves. Living specs add a feedback loop. As the InfoQ presentation explains, writing specs and requirements down aligns both agents and humans in AI-native development workflows.

This feedback loop changes the role of the spec. A Spec Kit discussion captures the idea: teams routinely version the source code generated by agents but neglect version control for the specs that produced it, thereby inverting the dependency relationship that matters most. In living spec-driven workflows, specifications become the source of truth, and implementation becomes the compiled output.

Not every change needs to originate in the spec for every tool, but teams that routinely regenerate code from stale specs should expect drift to recur unless implementation decisions are written back.

Cosmos is built around this exact problem. It keeps the spec and shared context connected to the repository as the codebase changes, rather than treating them as one-time inputs.

See how Cosmos keeps specs and shared context aligned across large repositories.

Explore Cosmos

Free tier available · VS Code extension · Takes 2 minutes

The Four Phases of Bidirectional Spec Updates

Bidirectional spec updates require clear separation between initial intent and implementation, followed by review and refinement to keep both aligned. ThoughtWorks guidance documents this split between design and implementation with a human always in the loop.

The four phases below illustrate how multi-agent code-generation workflows are structured when specs are treated as coordination infrastructure rather than as prompts.

PhaseDirectionWhat Happens
1. Initial IntentDeveloper → SpecDeveloper writes high-level requirements; AI expands them into structured specs
2. ImplementationSpec → Agent → CodeAgents read the finalized spec and generate code, tests, and documentation
3. Bidirectional UpdateImplementation → SpecAgents or developers update the spec to reflect what was actually built
4. Continuous RefinementProduction → SpecMetrics, incidents, and operational learnings feed back into the spec

Phase 3 is the defining characteristic. Without it, specifications are just elaborate prompts; with it, they become coordination infrastructure that more reliably reflects the current system state over time.

A practical warning from InfoQ: adopting spec-driven workflows without changing how product, architecture, engineering, and QA stakeholders collaborate risks creating layers of outdated documentation that no one maintains.

Anatomy of a Living Spec: Seven Essential Sections

A useful living spec gives agents enough context to implement correctly without turning the document into a brittle script. Based on the AGENTS.md standard, the Osmani guide, and the O'Reilly guide, several sections recur in effective agent-facing specs.

1. Agent role and project overview

The agent role and project overview reduce ambiguity by defining priorities, domain, and stack before implementation begins. That framing lets the agent make tradeoffs that match the repository rather than generic defaults.

text
## Agent Role
You are an implementor for a Node.js REST API serving financial transaction data.
Priority order: correctness > security > performance > code elegance.
## Project Overview
Mission: Real-time transaction processing API serving 10k+ merchants.
Stack: Node 20, TypeScript 5.3, PostgreSQL 15, Redis 7, Docker.

2. Key commands

Key commands improve execution reliability because agents repeatedly need exact build, test, lint, and migration syntax. Full commands reduce guesswork and cut avoidable tool errors.

text
## Commands
- Run tests: `npm test`
- Run single test: `npm test -- --grep "auth"`
- Build: `npm run build`
- Lint: `npm run lint`
- Database migrations: `npm run migrate:latest`
- Type check: `npx tsc --noEmit`

3. Architecture and critical files

Architecture and critical files improve navigation because file:line references point agents to the real control points in the codebase. That reduces exploration overhead and lowers the chance of editing the wrong layer.

text
## Critical Files
| What | Where |
|-------------------|------------------------------|
| App entry point | `src/index.ts` |
| Route definitions | `src/routes/index.ts:15` |
| Auth middleware | `src/middleware/auth.ts:42` |
| DB connection | `src/database/connection.ts` |

4. Code style via examples

Code-style examples improve consistency because a single working snippet shows patterns, error handling, and logging conventions more clearly than prose. That makes the generated code easier to review and merge.

text
// TypeScript 5.x
// Behavior: returns { error: "User not found" } and logs structured context on failure.
const result = await fetchUser(id)
if (result.error) {
logger.error("Failed to fetch user", { id, error: result.error })
return { error: "User not found" }
}
return { data: result.data }

5. Three-tier boundaries

Three-tier boundaries control agent autonomy by separating default-safe actions from changes that require approval or are prohibited outright. That prevents accidental edits to high-risk areas.

text
## Boundaries
### ✅ Always
- Write tests before implementation
- Use TypeScript strict mode
- Log errors with structured fields (never plain strings)
### ⚠️ Ask First
- Adding new dependencies
- Changing database schema
- Modifying authentication logic
### 🚫 Never
- Commit credentials or API keys
- Modify `.env.production`
- Push directly to `main`

6. Implementation status

Implementation status tracking improves coordination because the spec shows what is complete, in progress, or blocked. That visibility keeps reviewers and parallel agents working from the same state.

text
├─ ✓ Hero Section (completed)
├─ ✓ Feature Sections (completed)
├─ ◐ Redesign Hero (in progress)
├─ ◐ Mobile View (in progress)
└─ ○ Animations (not started)

7. Decision log

Decision logs prevent repeated debate because architectural choices and their reasons stay attached to the spec. Future agents and reviewers can then preserve intent rather than rediscover it.

text
## Decision Log
- 2026-03-15: Using RS256 for token signing (security audit requirement)
- 2026-03-16: Repository pattern for data access (consistency with existing services)

Writing Requirements at the Right Granularity

Requirement granularity determines whether a living spec guides implementation or overwhelms it.

The over-specification problem

Over-specification creates unstable agent behavior because detailed instructions can be either ignored or followed too literally, and both outcomes degrade implementation quality. That is why teams need enough structure to constrain risk without dictating every step.

Birgitta Böckeler's hands-on research, published on Martin Fowler's site, examines spec-driven development workflows and concludes that generating exhaustive acceptance criteria for small tasks adds more overhead than accuracy. Kent Beck's critique, also on Fowler, identifies the philosophical flaw: heavy upfront specification assumes nothing will be learned during implementation, which rarely holds in practice.

The practical implication: write enough spec to orient the agent and establish constraints, then iterate. Treat the spec as a hypothesis about what needs to be built rather than a finished blueprint.

Declarative outcomes beat imperative instructions

Declarative requirements produce better agent behavior because they describe the desired outcome and constraints instead of prescribing every implementation step. That gives the agent room to apply existing patterns while preserving reviewable success criteria.

This TDS article illustrates the contrast between declarative and imperative approaches:

ApproachExample
Imperative (over-specified)"Import numpy. Define a function called cosine_distance. Convert inputs to numpy arrays. Calculate the dot product. Calculate norms. Return 1 minus the quotient."
Declarative (outcome-focused)"Write a short and fast function in Python to compute the cosine distance between two input vectors."

Osmani emphasizes guiding agents with clear problem definitions and success criteria rather than expecting them to work unattended.

Before and after: a complete requirement rewrite

A good requirement rewrite improves implementation accuracy by separating context, constraints, output, and success criteria into fields that agents can act on and reviewers can verify.

Before (under-specified, mixed concerns):

text
Create a user dashboard that shows analytics. Use Redux for state management. It should load fast. Use Material-UI. Make it look modern.

This mixes functional requirements, technical mandates, unmeasurable performance goals, UI library choices, and subjective design language.

After (properly structured living spec):

text
## User Authentication
Context: New users cannot access protected endpoints. We need JWT-based auth.
Constraints:
- Use only standard libraries; no external auth services
- Tokens expire in 15 minutes
- Refresh tokens valid for 7 days
- Must work with existing user DB schema
Output Specification:
- POST /auth/login endpoint returning JWT
- Middleware function for route protection
- Unit tests covering happy path + 3 error cases
- Return format: JSON with {token, refreshToken, expiresIn}
Success Criteria:
- All tests pass
- No breaking changes to existing user endpoints
- Response time < 100ms for token validation

Anthropic context docs articulate the governing principle: strive for the minimal set of information that fully outlines expected behavior.

How Multi-Agent Coordinators Use Living Specs

Multi-agent coordinators depend on living specs because parallel agents need a shared record of current intent, task boundaries, and accepted decisions. Once multiple agents work in parallel, the spec becomes operational coordination rather than planning text. Understanding how autonomous agents transform development workflows clarifies why this coordination layer matters.

Augment Cosmos is built on this architecture. Cosmos coordinates agents against shared context and memory, with humans steering at a spec-and-intent-review checkpoint before agents execute.

A coordinator in this model typically handles four functions:

  • Context analysis: Inspect relevant repository structure and dependencies before task assignment
  • Specification drafting: Create or refine the working spec from developer intent
  • Task decomposition: Break the spec into executable units with handoff points
  • Delegation management: Assign work while accounting for inter-task dependencies

Dependency-aware decomposition matters because shared type or schema changes can create avoidable race conditions and merge conflicts if tasks are split without dependency order. Coordinating that ordering across parallel agents is part of what Cosmos is designed to handle.

Cosmos runs a default coordination model: a coordinator drafts the spec and delegates, implementor agents execute in parallel, and a verifier checks results against the spec. It also ships with reusable reference experts for common work:

Reference ExpertResponsibility
Deep Code ReviewContext-aware review tuned for recall, to catch every bug possible
PR AuthorImplements changes to a merge-ready state
E2E TestingValidates against real infrastructure
Incident ResponseTriages and resolves incidents

Cosmos keeps a human in the loop by design: teams set the policies for where human judgment is required, and specs return for review before agents independently write, test, and review code.

If keeping parallel agents aligned to one spec is the bottleneck, Cosmos's model is built for exactly that.

Reviewing Agent-Updated Living Specs

Reviewing agent-updated specs requires checking both implementation accuracy and whether the spec still captures the team's real decisions. GitHub's spec-driven development guidance emphasizes phases such as Specify, Plan, Tasks, and Implement, with specifications versioned alongside the repository.

Four triggers for spec review

Spec review should happen at predictable transition points because drift usually appears when code, requirements, or data models change faster than documentation. Regular triggers keep the spec close enough to implementation to remain useful.

Open source
augmentcode/review-pr38
Star on GitHub
  • After each agent implementation cycle, review incremental changes rather than thousand-line code dumps
  • Before transitioning from spec to coding phase: Validate the spec itself before implementation begins
  • When agents surface ambiguities or edge cases: These moments indicate gaps requiring clarification
  • When data models or requirements change: Trigger spec updates immediately

What to focus on during spec review

Spec review should focus on high-risk mismatches because correctness problems usually hide in architecture, security boundaries, and undocumented decisions rather than syntax. Reviewing those areas first keeps agent-written changes maintainable over repeated regeneration cycles.

The Anthropic guide on building effective agents recommends that engineers review agent outputs and findings to confirm accuracy and refine results, with human oversight remaining an important part of the process. Three areas deserve particular scrutiny:

  • Architectural coherence: Consistency across the codebase and alignment with system design
  • Security-critical sections: Bright Security advises teams to be stricter around authentication, authorization, and state changes
  • Decision log entries: Verify that the architectural choices recorded in the spec reflect team intent

Why version-controlled specs reduce drift

Version-controlled specs reduce drift because the same review, diff, and history tools used for code also expose requirement changes over time. That creates institutional memory for both humans and agents.

In Osmani's workflow, commit the spec file to the repo so the agent can use git diff or git blame to understand changes across sessions.

When specs are stored in version control, agents retain memory across sessions. This is where Augment Code's Context Engine matters: it analyzes codebases across 400,000+ files through semantic dependency graph analysis. That keeps specs anchored to the real structure of the codebase rather than to a single model's context window.

Eight Antipatterns That Derail Agent Workflows

Agent workflows break when the specification either omits critical constraints or tries to control every implementation detail. Osmani's research frames the core principle: teams plan, verify, and refine a spec for an AI agent rather than treat it as a one-and-done artifact.

AntipatternWhat Goes WrongFix
Under-specificationAgents fill gaps with assumptions; no opportunity to ask for clarification in automated workflowsUse structured acceptance criteria with testable requirements
Over-specificationAgents may ignore detailed specs or follow them too literally, creating duplicates or unrequested featuresSpecify outcomes and constraints, not implementation steps
Mixed functional/technical concernsAgents cannot distinguish must-have constraints from suggestions without explicit prioritisationUse separate sections: functional requirements, technical stack, performance constraints, boundaries
Missing context continuityAgents repeat previously corrected mistakes when conventions are not preservedMaintain an AGENTS.md file and a project notes file for recurring errors
Vague success criteriaAgents have no clear stopping rule, so iteration becomes arbitraryUse quantifiable, testable criteria such as response time or test coverage requirements
Jumping to solutionsAgents implement the described solution rather than the actual problemFollow Specify → Plan → Tasks → Implement
Environmental context blindnessCode works locally but ignores runtime, deployment, or secrets boundariesInclude deployment context, secrets boundaries, and infrastructure constraints
Token-insensitive specsLong, unfocused context can degrade performance and review quality as task complexity growsProvide targeted context relevant to the specific task

Böckeler's research on the Fowler article documented agent failure modes in supervised coding sessions, including misdiagnosis of problems, brute-force fixes, and misunderstood requirements. In practice, intent plus constraints produces more stable outcomes than procedures plus exhaustive detail.

Protecting critical decisions without over-specifying

Protected-decision markers preserve architectural constraints by separating non-negotiable choices from implementation details that agents can adapt. That keeps critical security or compliance decisions from being rewritten accidentally.

text
<!-- BEGIN USER-SPECIFIED -->
Authentication Design Decision:
We use JWT tokens with 15-minute expiration and refresh token rotation.
DO NOT change this to session-based auth or increase token duration.
Rationale: Security audit requirement from 2026-01-15.
<!-- END USER-SPECIFIED -->

Living Specs Across the Tool Landscape

The spec-driven development landscape includes a range of tools and workflow styles. The table below shows how agentic and spec-driven coding approaches differ across tools, so teams can choose based on workflow shape rather than marketing labels.

ToolSpec TypeBest Fit
Augment CosmosUnified cloud agents platform; coordinated agents with shared context and a spec/intent-review checkpointEnterprise codebases, parallel agent execution at scale
AWS KiroSpecs using EARS notation with human review gatesFormal, compliance-heavy greenfield AWS projects
GitHub Spec KitCross-agent spec-driven development toolkitTeams using multiple AI tools that need tool-agnostic specs
Cursor + .cursorrulesStatic rules-based configurationIndividual developer productivity, iterative work
Claude Code + CLAUDE.mdStatic instruction filesWell-defined tasks with active human review

Cosmos combines shared context and memory with explicit multi-agent coordination. That matters most for teams that need orchestration across large, existing codebases rather than single-agent assistance.

ThoughtWorks describes spec-driven development as an emerging approach to AI-assisted coding workflows, and multiple sources describe AGENTS.md as an emerging open standard for cross-tool interoperability.

Version The Spec Like You Version The Code

Pick one task this week that is large enough to drift but still reviewable in a single pull request: a JWT auth change, a billing workflow fix, or a cross-service refactor. Write the spec in the repo, define 3-5 measurable success criteria, and require implementation updates to the decision log before merge. That process change usually reveals whether the team has a spec problem, a review problem, or a coordination problem.

If the work spans multiple services or parallel agents, a workflow that keeps specs aligned with implementation matters more than any individual tool choice. Cosmos is built around this coordination problem.

See how Cosmos keeps specs and implementation in sync.

Explore Cosmos

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

FAQs about Living Specs for AI Agents

Written by

Molisha Shah

Molisha Shah

Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.


Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.