Living specs are the most reliable way to guide AI agent development because implementation changes flow back into the specification and prevent spec drift.
TL;DR
AI-generated code drifts quickly when teams keep regenerating it based on stale requirements. Static specs fail because they only move information one way. Living specs reduce that gap by writing implementation decisions back into the spec, keeping requirements, constraints, and code aligned across repeated development cycles.
From Static Specs to Living Specs
Engineering teams using AI coding agents quickly run into the same failure mode: the markdown says one thing, the repository says another, and the next regeneration cycle amplifies the mismatch. InfoQ research frames the underlying problem: a code issue reflects a gap in the specification, and because AI generation is non-deterministic, that gap keeps resurfacing as code is regenerated. That makes static specs fragile in workflows that expect repeated regeneration, review, or multi-step handoffs.
Living specs change the direction of information flow. Instead of treating the spec as a one-time prompt, teams treat it as an evolving artifact that captures requirements, constraints, and implementation decisions. ThoughtWorks, Anthropic, and Addy Osmani each emphasize a similar operating principle: write enough structure for the agent to act correctly, then update it as reality changes.
For teams evaluating orchestration tools, some vendors now describe living specs as workflow infrastructure rather than just documentation. Augment Cosmos, Augment Code's unified cloud agents platform with shared context and memory that compounds across the team and the software development lifecycle, is one example. Cosmos uses repository context from its Context Engine to coordinate spec-driven work across agents, with a human checkpoint to review the spec before agents execute. This guide explains what living specs are, how to structure them, where teams overspecify, and how to review agent-updated specs without creating documentation debt.
Why Static Specs Fail AI Agent Workflows
Static specs fail AI agent workflows because they only move information in one direction, which lets implementation drift compound across regeneration cycles.
The root cause is directional. Static specs flow one way: a developer writes requirements, an agent consumes them, and the spec remains unchanged while the codebase evolves. Living specs add a feedback loop. As the InfoQ presentation explains, writing specs and requirements down aligns both agents and humans in AI-native development workflows.
This feedback loop changes the role of the spec. A Spec Kit discussion captures the idea: teams routinely version the source code generated by agents but neglect version control for the specs that produced it, thereby inverting the dependency relationship that matters most. In living spec-driven workflows, specifications become the source of truth, and implementation becomes the compiled output.
Not every change needs to originate in the spec for every tool, but teams that routinely regenerate code from stale specs should expect drift to recur unless implementation decisions are written back.
Cosmos is built around this exact problem. It keeps the spec and shared context connected to the repository as the codebase changes, rather than treating them as one-time inputs.
See how Cosmos keeps specs and shared context aligned across large repositories.
Free tier available · VS Code extension · Takes 2 minutes
The Four Phases of Bidirectional Spec Updates
Bidirectional spec updates require clear separation between initial intent and implementation, followed by review and refinement to keep both aligned. ThoughtWorks guidance documents this split between design and implementation with a human always in the loop.
The four phases below illustrate how multi-agent code-generation workflows are structured when specs are treated as coordination infrastructure rather than as prompts.
| Phase | Direction | What Happens |
|---|---|---|
| 1. Initial Intent | Developer → Spec | Developer writes high-level requirements; AI expands them into structured specs |
| 2. Implementation | Spec → Agent → Code | Agents read the finalized spec and generate code, tests, and documentation |
| 3. Bidirectional Update | Implementation → Spec | Agents or developers update the spec to reflect what was actually built |
| 4. Continuous Refinement | Production → Spec | Metrics, incidents, and operational learnings feed back into the spec |
Phase 3 is the defining characteristic. Without it, specifications are just elaborate prompts; with it, they become coordination infrastructure that more reliably reflects the current system state over time.
A practical warning from InfoQ: adopting spec-driven workflows without changing how product, architecture, engineering, and QA stakeholders collaborate risks creating layers of outdated documentation that no one maintains.
Anatomy of a Living Spec: Seven Essential Sections
A useful living spec gives agents enough context to implement correctly without turning the document into a brittle script. Based on the AGENTS.md standard, the Osmani guide, and the O'Reilly guide, several sections recur in effective agent-facing specs.
1. Agent role and project overview
The agent role and project overview reduce ambiguity by defining priorities, domain, and stack before implementation begins. That framing lets the agent make tradeoffs that match the repository rather than generic defaults.
2. Key commands
Key commands improve execution reliability because agents repeatedly need exact build, test, lint, and migration syntax. Full commands reduce guesswork and cut avoidable tool errors.
3. Architecture and critical files
Architecture and critical files improve navigation because file:line references point agents to the real control points in the codebase. That reduces exploration overhead and lowers the chance of editing the wrong layer.
4. Code style via examples
Code-style examples improve consistency because a single working snippet shows patterns, error handling, and logging conventions more clearly than prose. That makes the generated code easier to review and merge.
5. Three-tier boundaries
Three-tier boundaries control agent autonomy by separating default-safe actions from changes that require approval or are prohibited outright. That prevents accidental edits to high-risk areas.
6. Implementation status
Implementation status tracking improves coordination because the spec shows what is complete, in progress, or blocked. That visibility keeps reviewers and parallel agents working from the same state.
7. Decision log
Decision logs prevent repeated debate because architectural choices and their reasons stay attached to the spec. Future agents and reviewers can then preserve intent rather than rediscover it.
Writing Requirements at the Right Granularity
Requirement granularity determines whether a living spec guides implementation or overwhelms it.
The over-specification problem
Over-specification creates unstable agent behavior because detailed instructions can be either ignored or followed too literally, and both outcomes degrade implementation quality. That is why teams need enough structure to constrain risk without dictating every step.
Birgitta Böckeler's hands-on research, published on Martin Fowler's site, examines spec-driven development workflows and concludes that generating exhaustive acceptance criteria for small tasks adds more overhead than accuracy. Kent Beck's critique, also on Fowler, identifies the philosophical flaw: heavy upfront specification assumes nothing will be learned during implementation, which rarely holds in practice.
The practical implication: write enough spec to orient the agent and establish constraints, then iterate. Treat the spec as a hypothesis about what needs to be built rather than a finished blueprint.
Declarative outcomes beat imperative instructions
Declarative requirements produce better agent behavior because they describe the desired outcome and constraints instead of prescribing every implementation step. That gives the agent room to apply existing patterns while preserving reviewable success criteria.
This TDS article illustrates the contrast between declarative and imperative approaches:
| Approach | Example |
|---|---|
| Imperative (over-specified) | "Import numpy. Define a function called cosine_distance. Convert inputs to numpy arrays. Calculate the dot product. Calculate norms. Return 1 minus the quotient." |
| Declarative (outcome-focused) | "Write a short and fast function in Python to compute the cosine distance between two input vectors." |
Osmani emphasizes guiding agents with clear problem definitions and success criteria rather than expecting them to work unattended.
Before and after: a complete requirement rewrite
A good requirement rewrite improves implementation accuracy by separating context, constraints, output, and success criteria into fields that agents can act on and reviewers can verify.
Before (under-specified, mixed concerns):
This mixes functional requirements, technical mandates, unmeasurable performance goals, UI library choices, and subjective design language.
After (properly structured living spec):
Anthropic context docs articulate the governing principle: strive for the minimal set of information that fully outlines expected behavior.
How Multi-Agent Coordinators Use Living Specs
Multi-agent coordinators depend on living specs because parallel agents need a shared record of current intent, task boundaries, and accepted decisions. Once multiple agents work in parallel, the spec becomes operational coordination rather than planning text. Understanding how autonomous agents transform development workflows clarifies why this coordination layer matters.
Augment Cosmos is built on this architecture. Cosmos coordinates agents against shared context and memory, with humans steering at a spec-and-intent-review checkpoint before agents execute.
A coordinator in this model typically handles four functions:
- Context analysis: Inspect relevant repository structure and dependencies before task assignment
- Specification drafting: Create or refine the working spec from developer intent
- Task decomposition: Break the spec into executable units with handoff points
- Delegation management: Assign work while accounting for inter-task dependencies
Dependency-aware decomposition matters because shared type or schema changes can create avoidable race conditions and merge conflicts if tasks are split without dependency order. Coordinating that ordering across parallel agents is part of what Cosmos is designed to handle.
Cosmos runs a default coordination model: a coordinator drafts the spec and delegates, implementor agents execute in parallel, and a verifier checks results against the spec. It also ships with reusable reference experts for common work:
| Reference Expert | Responsibility |
|---|---|
| Deep Code Review | Context-aware review tuned for recall, to catch every bug possible |
| PR Author | Implements changes to a merge-ready state |
| E2E Testing | Validates against real infrastructure |
| Incident Response | Triages and resolves incidents |
Cosmos keeps a human in the loop by design: teams set the policies for where human judgment is required, and specs return for review before agents independently write, test, and review code.
If keeping parallel agents aligned to one spec is the bottleneck, Cosmos's model is built for exactly that.
Reviewing Agent-Updated Living Specs
Reviewing agent-updated specs requires checking both implementation accuracy and whether the spec still captures the team's real decisions. GitHub's spec-driven development guidance emphasizes phases such as Specify, Plan, Tasks, and Implement, with specifications versioned alongside the repository.
Four triggers for spec review
Spec review should happen at predictable transition points because drift usually appears when code, requirements, or data models change faster than documentation. Regular triggers keep the spec close enough to implementation to remain useful.
- After each agent implementation cycle, review incremental changes rather than thousand-line code dumps
- Before transitioning from spec to coding phase: Validate the spec itself before implementation begins
- When agents surface ambiguities or edge cases: These moments indicate gaps requiring clarification
- When data models or requirements change: Trigger spec updates immediately
What to focus on during spec review
Spec review should focus on high-risk mismatches because correctness problems usually hide in architecture, security boundaries, and undocumented decisions rather than syntax. Reviewing those areas first keeps agent-written changes maintainable over repeated regeneration cycles.
The Anthropic guide on building effective agents recommends that engineers review agent outputs and findings to confirm accuracy and refine results, with human oversight remaining an important part of the process. Three areas deserve particular scrutiny:
- Architectural coherence: Consistency across the codebase and alignment with system design
- Security-critical sections: Bright Security advises teams to be stricter around authentication, authorization, and state changes
- Decision log entries: Verify that the architectural choices recorded in the spec reflect team intent
Why version-controlled specs reduce drift
Version-controlled specs reduce drift because the same review, diff, and history tools used for code also expose requirement changes over time. That creates institutional memory for both humans and agents.
In Osmani's workflow, commit the spec file to the repo so the agent can use git diff or git blame to understand changes across sessions.
When specs are stored in version control, agents retain memory across sessions. This is where Augment Code's Context Engine matters: it analyzes codebases across 400,000+ files through semantic dependency graph analysis. That keeps specs anchored to the real structure of the codebase rather than to a single model's context window.
Eight Antipatterns That Derail Agent Workflows
Agent workflows break when the specification either omits critical constraints or tries to control every implementation detail. Osmani's research frames the core principle: teams plan, verify, and refine a spec for an AI agent rather than treat it as a one-and-done artifact.
| Antipattern | What Goes Wrong | Fix |
|---|---|---|
| Under-specification | Agents fill gaps with assumptions; no opportunity to ask for clarification in automated workflows | Use structured acceptance criteria with testable requirements |
| Over-specification | Agents may ignore detailed specs or follow them too literally, creating duplicates or unrequested features | Specify outcomes and constraints, not implementation steps |
| Mixed functional/technical concerns | Agents cannot distinguish must-have constraints from suggestions without explicit prioritisation | Use separate sections: functional requirements, technical stack, performance constraints, boundaries |
| Missing context continuity | Agents repeat previously corrected mistakes when conventions are not preserved | Maintain an AGENTS.md file and a project notes file for recurring errors |
| Vague success criteria | Agents have no clear stopping rule, so iteration becomes arbitrary | Use quantifiable, testable criteria such as response time or test coverage requirements |
| Jumping to solutions | Agents implement the described solution rather than the actual problem | Follow Specify → Plan → Tasks → Implement |
| Environmental context blindness | Code works locally but ignores runtime, deployment, or secrets boundaries | Include deployment context, secrets boundaries, and infrastructure constraints |
| Token-insensitive specs | Long, unfocused context can degrade performance and review quality as task complexity grows | Provide targeted context relevant to the specific task |
Böckeler's research on the Fowler article documented agent failure modes in supervised coding sessions, including misdiagnosis of problems, brute-force fixes, and misunderstood requirements. In practice, intent plus constraints produces more stable outcomes than procedures plus exhaustive detail.
Protecting critical decisions without over-specifying
Protected-decision markers preserve architectural constraints by separating non-negotiable choices from implementation details that agents can adapt. That keeps critical security or compliance decisions from being rewritten accidentally.
Living Specs Across the Tool Landscape
The spec-driven development landscape includes a range of tools and workflow styles. The table below shows how agentic and spec-driven coding approaches differ across tools, so teams can choose based on workflow shape rather than marketing labels.
| Tool | Spec Type | Best Fit |
|---|---|---|
| Augment Cosmos | Unified cloud agents platform; coordinated agents with shared context and a spec/intent-review checkpoint | Enterprise codebases, parallel agent execution at scale |
| AWS Kiro | Specs using EARS notation with human review gates | Formal, compliance-heavy greenfield AWS projects |
| GitHub Spec Kit | Cross-agent spec-driven development toolkit | Teams using multiple AI tools that need tool-agnostic specs |
| Cursor + .cursorrules | Static rules-based configuration | Individual developer productivity, iterative work |
| Claude Code + CLAUDE.md | Static instruction files | Well-defined tasks with active human review |
Cosmos combines shared context and memory with explicit multi-agent coordination. That matters most for teams that need orchestration across large, existing codebases rather than single-agent assistance.
ThoughtWorks describes spec-driven development as an emerging approach to AI-assisted coding workflows, and multiple sources describe AGENTS.md as an emerging open standard for cross-tool interoperability.
Version The Spec Like You Version The Code
Pick one task this week that is large enough to drift but still reviewable in a single pull request: a JWT auth change, a billing workflow fix, or a cross-service refactor. Write the spec in the repo, define 3-5 measurable success criteria, and require implementation updates to the decision log before merge. That process change usually reveals whether the team has a spec problem, a review problem, or a coordination problem.
If the work spans multiple services or parallel agents, a workflow that keeps specs aligned with implementation matters more than any individual tool choice. Cosmos is built around this coordination problem.
See how Cosmos keeps specs and implementation in sync.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
FAQs about Living Specs for AI Agents
Related
Written by

Molisha Shah
Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.