Spec-Driven Development with AI: Write the Spec First, Then Prompt the Implementation

Eleftheria DrosopoulouMay 8th, 2026Last Updated: May 2nd, 2026

0 312 12 minutes read

Vibe coding gets a feature built. Spec-driven development gets a system built correctly. The difference is whether you hand an AI agent a wish or a contract. In 2026, the engineers building things that last are writing the formal spec first — OpenAPI, ADRs, structured markdown — and using AI to implement against it.

1. The Problem With Vibe Coding at Scale

When GitHub Copilot first arrived, the productivity gain was immediate and obvious. You typed a comment describing what you wanted, and code appeared that roughly matched. It was impressive. It was also a workflow designed for getting something working, not for building something maintainable.

The pattern that emerged from the first wave of AI-assisted development — describe loosely, iterate, accept, repeat — has a name now. Vibe coding captures the experience accurately: you follow your instincts, describe what you want in natural language, and accept what comes back. For an individual developer building something new, it works well. For a team building a production system, it creates a specific failure mode: AI-generated code that is locally coherent but architecturally inconsistent across files, services, and developers.

The problem is not the AI. Language models are excellent at pattern completion. The problem is that vague instructions produce vague results regardless of the model, and when five engineers are separately vibe-coding against the same system with five different implicit assumptions about API contracts and error handling, the integration phase becomes expensive. As one practitioner on the Eventuallymaking engineering blog observed in early 2026: “If you don’t know how to explain what you want to do and how you plan to do it, AI won’t save you; it will just produce technical debt at industrial speed.”

Vibe coding prompt

“Add an endpoint that lets users upload a profile photo with a 5MB limit and JPEG/PNG support.”

The agent makes seventeen implicit decisions: HTTP method, URL structure, field names, error codes, thumbnail sizes, rate limiting, storage strategy. Some will match the rest of your API. Most will not. None are documented.

Spec-driven prompt

“Implement the POST /users/{id}/avatar endpoint defined in openapi.yaml. Follow the error envelope schema in docs/adr/001-error-handling.md. Write tests covering the 413, 415, and 422 cases.”

Every decision has already been made and reviewed. The agent’s job is implementation, not design.

Spec-driven development resolves this by separating the design phase from the implementation phase. The spec is where you make decisions. The AI implements against the decisions you have already made. That shift sounds simple and it changes everything about how AI-assisted work actually feels.

2. What Spec-Driven Development Actually Means

The term means different things to different engineers in 2026, and it is worth being precise. The Thoughtworks analysis from December 2025 identified two camps with meaningfully different philosophies about the role of the spec.

The first camp — the more radical view — argues that specs should become the sole source of truth, and that generated code is a disposable byproduct. If the spec changes, you regenerate. You never edit the code directly. This is sometimes called “spec-as-source” development, and it is already standard practice in narrow domains: generating server stubs from OpenAPI, producing certified embedded code from Simulink models. Emerging platforms like Tessl extend this vision to general software development.

The second camp — the more pragmatic view, and the one that translates most directly to teams working in 2026 — treats specs as high-quality context documents that drive code generation without replacing code as the artifact you maintain. Executable code remains the source of truth. The spec makes the AI’s job easier and the human reviewer’s job easier. This is the approach sometimes called “spec-anchored” development, and it is where most production teams are operating.

Spec-driven development, for the purposes of this article, means: writing formal specifications (OpenAPI contracts, ADRs, structured feature markdown) before implementation begins, then using those specs as the primary context when prompting AI agents to generate code. The spec is what the human reviews and owns. The implementation is what the agent produces against it.

Peer-reviewed research presented at ICSE 2026 made this case quantitatively: incorporating architectural documentation substantially improves LLM-assisted code generation, with measurable gains in functional correctness, architectural conformance, and code modularity. The research finding is intuitive in hindsight — context is what makes language models accurate, and a well-written spec is excellent context.

3. The Four-Phase Workflow

The workflow that has solidified across teams and tools in 2025 and 2026 follows four phases. The names vary but the structure is consistent enough that it is worth describing as a standard.

1.Specify — write what you are building and why

User stories, acceptance criteria, edge cases, error conditions, and the business context behind each decision. An AI agent can help generate a first draft from a rough description, but a human must review and own it. The spec should explicitly define external behavior: input/output mappings, preconditions and postconditions, invariants, interface types, integration contracts, and sequential logic. This is more than a PRD. The “why” behind each decision is as important as the “what” — an agent that understands intent makes better choices when the spec has gaps.

2. Plan — define technical constraints and architecture

Stack choices, patterns to follow, architectural decisions that matter for this feature. This phase produces the implementation plan: what the agent will build and in what order. Senior engineers or architects typically review this phase, because it is where the ADRs get written and the patterns get established. The plan gets broken into concrete, testable work items with clear inputs, expected outputs, and validation criteria — exactly the structure an AI agent can execute against predictably.

3. Task — decompose into atomic, independently verifiable units

This is the phase that most distinguishes spec-driven development from iterative prompting. Each task has a clear definition of done that can be verified mechanically: a test that passes, a schema that validates, an endpoint that matches the contract. Tasks should be sized so a single agent invocation can complete one without losing context. Too large and the agent makes scope decisions you wanted to make. Too small and the overhead of task management exceeds the benefit.

4. Implement — the agent executes the tasks against the spec

With a well-written spec, plan, and task breakdown, the agent’s implementation phase should feel like watching a capable contractor follow a detailed brief. The spec is loaded into context on every task. The agent does not invent API shapes or error codes — it reads them from the contract. Deviations from the spec become visible immediately, either through test failures or through CI validation against the OpenAPI document.

Where Developer Time Is Spent: Vibe Coding vs. Spec-Driven Development

Approximate time allocation across a typical feature delivery cycle. Based on practitioner surveys and team retrospective data, 2025–2026.

4. What Goes in the Spec: Three Document Types

The spec in spec-driven development is not a single document type. Different aspects of a system require different formal descriptions. In practice, three types of document do most of the work.

Spec Type	What It Describes	Format	Agent Use
API Contract	Endpoints, request/response schemas, error codes, authentication, rate limits	OpenAPI 3.1 YAML or JSON	Generate stubs, validate implementations, generate test cases, produce client SDKs
Architecture Decision Record (ADR)	Architectural choices: why, what was rejected, consequences	Structured markdown (MADR format or custom)	Constrain agent choices; prevent re-litigating settled decisions; document intent
Feature Spec	User stories, acceptance criteria, edge cases, success metrics	Structured markdown with defined sections	Primary context for implementation tasks; defines done criteria agents can verify against
AsyncAPI	Event-driven system contracts: topics, message schemas, producers and consumers	AsyncAPI 3.0 YAML	Same role as OpenAPI but for event-driven systems; increasingly common in 2026
JSON Schema	Data shape validation: fields, types, constraints, required properties	JSON Schema draft 2020-12	Force multiplier: validates both agent output and production data automatically

A key insight from teams practicing SDD at scale is to treat these documents as living artifacts, not one-time deliverables. The most mature teams in 2026, as documented by Eventuallymaking’s February analysis, evolve the spec in lockstep with the code. When implementation reveals a gap in the spec, the spec is updated first, then the implementation follows. This inverts the direction of drift that kills most documentation: instead of code drifting away from docs, docs stay current because they are the mechanism of control.

5. OpenAPI First: The API Contract as Spec

For backend engineers coming from a Java or Spring Boot background, the most immediately practical entry point into spec-driven development is API-first design using OpenAPI. This is not a new idea — the Swagger specification predates many of the AI developments of the last two years — but it has gained a specific new relevance as the primary mechanism for constraining AI agent behavior during implementation.

The key insight from the financial services case study documented in the January 2026 arXiv paper on spec-driven development is instructive: a company struggling with what they called “integration hell” — microservices that failed when deployed together because teams had made incompatible assumptions — mandated OpenAPI-first development. Teams wrote the specification before writing any service code. Consumer teams reviewed and signed off before implementation began. The result was a 75% reduction in integration cycle time, because the conversations that previously happened during expensive integration testing now happened during cheap spec review.

openapi.yaml — Feature Spec as Contract

# This spec is the source of truth for the agent's implementation.
# Every decision below was made during design review — the agent
# implements against these decisions, not around them.

paths:
  /users/{userId}/avatar:
    post:
      summary: Upload user profile photo
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  # Agent must enforce: max 5MB, JPEG/PNG only
      responses:
        '200':
          description: Avatar updated. Returns URLs for all thumbnail sizes.
        '413':
          description: File exceeds 5MB limit
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorEnvelope'
        '415':
          description: Unsupported media type
        '422':
          description: Validation failure (corrupt file, etc.)

When this specification is the context for an agent’s implementation task, the agent is not designing an API. It is implementing one that has already been designed, reviewed, and approved. That is a fundamentally different relationship between the engineer and the tool: the engineer is the architect, the agent is the contractor, and the spec is the blueprint they both reference.

6. ADRs as Architectural Guardrails for Agents

Architecture Decision Records serve a different but equally important function in spec-driven development. Where OpenAPI documents define what is built, ADRs document why — the decisions that were made, what was considered and rejected, and what the consequences are expected to be. In a traditional team, ADRs prevent engineers from repeatedly re-litigating settled decisions. In an AI-assisted team, they serve an additional function: they constrain the agent’s design choices in the domains where you have already made deliberate architectural decisions.

A concrete example: if your team has adopted an ADR that all inter-service communication uses event-sourcing patterns rather than synchronous HTTP calls, an agent that does not have that ADR in context will implement synchronous HTTP calls by default, because that is the dominant pattern in its training data. An agent that receives the ADR as context will implement event-sourcing. The ADR is not just documentation. It is a runtime constraint on agent behavior.

docs/adr/004-error-handling.md

# ADR-004: Standardized Error Envelope for All API Endpoints

Status: Accepted
Date: 2026-02-14

## Context
All API endpoints currently return errors in different shapes.
This causes inconsistent client error handling and complicates
shared middleware. We need a single error contract.

## Decision
All 4xx and 5xx responses return this envelope:

  code:    string  # machine-readable, e.g. "VALIDATION_FAILED"
  message: string  # human-readable, safe to display
  traceId: string  # correlates with log aggregation

## Rejected Alternatives
- RFC 7807 (Problem Details): rejected — verbose for our use case.
- Per-endpoint error schemas: rejected — inconsistent client DX.

## Consequences
Agent MUST use ErrorEnvelope for all error responses.
Agent MUST NOT invent new error code strings — use the
enum in components/schemas/ErrorCode in openapi.yaml.

ADRs do double duty in spec-driven development. For human engineers, they prevent re-litigation of settled decisions. For AI agents, they are injected into the system prompt or context window to prevent the agent from defaulting to patterns that conflict with deliberate architectural choices. A team that treats ADRs as both documentation and agent context gets both benefits simultaneously.

7. Prompting Against the Spec: What Changes

Once you have a spec, the prompt changes fundamentally in structure. Rather than describing what you want and hoping the agent infers the constraints, you reference the spec and describe the task to be executed against it.

The core shift is from intent-prompting to task-prompting. Intent-prompting says “build me X.” Task-prompting says “implement X as defined in these documents, following these constraints, producing these outputs that I can verify mechanically.” The agent’s latitude narrows dramatically, and the quality of the output rises in proportion to the quality of the spec.

Prompt structure for spec-driven implementation

# Context injection — what the agent needs to know
Given the following documents (attached to this conversation):
  - openapi.yaml: the API contract for this service
  - docs/adr/001-error-handling.md: error envelope standard
  - docs/adr/003-auth-patterns.md: JWT validation approach

# Specific task — scoped to one atomic unit
Implement the POST /users/{userId}/avatar endpoint.

# Explicit constraints — leaving nothing to default behavior
Requirements:
  - Follow the request schema in openapi.yaml exactly
  - Use ErrorEnvelope for all 4xx responses (ADR-001)
  - Validate JWT in Authorization header (ADR-003)
  - File validation: 5MB max, JPEG/PNG only, reject corrupt files

# Verification criteria — the agent knows what done looks like
Deliverables:
  - AvatarController.java implementing the endpoint
  - AvatarService.java with upload and thumbnail generation
  - AvatarControllerTest.java with tests for 200, 413, 415, 422
  - All tests pass against the schema-validated responses

Notice what is absent from this prompt: the engineer is not describing business logic, not defining error codes, not deciding authentication strategy. Those decisions were all made during spec authoring. The prompt is an execution instruction, not a design instruction. The agent cannot take the project in the wrong architectural direction because the specification has already defined the right one.

Code Quality Metrics: Vibe Coding vs. Spec-Anchored AI Development

Scored 1–10 across quality dimensions. Based on ICSE 2026 findings and practitioner benchmarks.

8. The Tooling Landscape in 2026

The tooling ecosystem for spec-driven development has matured rapidly since mid-2025. The tools broadly divide into two categories: living-spec platforms that maintain bidirectional synchronization between the spec and the code, and static-spec tools that structure requirements upfront but require human reconciliation when implementation diverges.

The pattern that emerges from evaluating these tools is that the core value — spec context injected into every agent interaction — can be achieved with entirely low-tech tooling: well-structured markdown files in a docs/ folder, loaded into context explicitly on each task. The sophisticated platforms add automation, verification, and synchronization. But a team that starts by simply maintaining an openapi.yaml and a few ADRs and referencing them in every implementation prompt will capture 70% of the benefit before installing anything new.

9. The Honest Objections

Spec-driven development has genuine critics, and their objections deserve honest engagement rather than dismissal. The most substantive one is the waterfall concern.

“In spec-driven development with AI agents, the feedback loop is minutes. You write a spec. The agent generates the implementation in five to fifteen minutes. You review it. If the spec was wrong or incomplete, you update the spec and regenerate.”— Alex Cloudstar, “Spec-Driven Development 2026: AI or Waterfall?”, March 2026

The waterfall concern says: writing extensive specs before coding is what the industry abandoned in favor of agile development. Is SDD just waterfall with AI-generated code? The response is compelling: Waterfall failed not because specifications are bad, but because the cost of discovering your specification was wrong was catastrophically high — months of development wasted on the wrong foundation. When regenerating code from an updated spec takes fifteen minutes rather than months, the economics change completely. The spec becomes an iterative document you can update and re-execute against, not a contract set in stone before the first line of code.

The second honest objection is maintenance. Who maintains the spec? If the implementation diverges from the spec over time, you have the classic problem of outdated documentation — but worse, because engineers are now using the spec as agent context, and a stale spec produces wrong agent output. This is a real risk, and the answer is institutional: the spec must be treated with the same discipline as the code. Breaking the spec should fail the build, exactly as breaking tests does.

The teams that try spec-driven development and abandon it usually do so because the spec becomes a formal document that nobody maintains after the first sprint. The spec drifts from the code, agents start generating output that conflicts with actual behavior, and the team reverts to vibe coding because at least that is honest about its lack of constraints. The fix is not organizational — it is mechanical: CI validation that fails when implementation diverges from the OpenAPI spec, enforced on every pull request.

10. What We Have Learned

Spec-driven development is not a return to waterfall and it is not a replacement for agile iteration. It is a recognition that AI agents are exceptionally good at implementing against explicit contracts and exceptionally bad at inferring implicit ones. The quality of what an agent produces is proportional to the quality of the context you give it, and a well-written spec — OpenAPI contract, ADR, structured feature markdown — is the best context you can provide.

The workflow that has solidified in 2026 is four phases: specify, plan, task, implement. The spec is what humans review and own. The implementation is what agents produce against it. Deviations from the spec are caught mechanically, in CI, before they become production problems. The result — documented quantitatively by the ICSE 2026 research and the financial services case studies — is measurable improvement in functional correctness, architectural conformance, and integration cycle time, alongside the developer productivity gains of AI-assisted development.

The practical starting point is lower than most engineers expect. You do not need a living-spec platform, a new tool, or an organizational change. You need an OpenAPI file for your service’s contracts, two or three ADRs capturing your most important architectural decisions, and the discipline to reference them in every implementation prompt. Start with the next feature. Write the spec before you open a code editor. Inject it into the agent’s context when you do. The difference in output quality will tell you whether to invest further. It almost certainly will.

Spec-Driven Development with AI: Write the Spec First, Then Prompt the Implementation

1. The Problem With Vibe Coding at Scale

2. What Spec-Driven Development Actually Means

3. The Four-Phase Workflow

4. What Goes in the Spec: Three Document Types

5. OpenAPI First: The API Contract as Spec

6. ADRs as Architectural Guardrails for Agents

7. Prompting Against the Spec: What Changes

8. The Tooling Landscape in 2026

9. The Honest Objections

10. What We Have Learned

Thank you!

Eleftheria Drosopoulou

Thank you!

1. The Problem With Vibe Coding at Scale

2. What Spec-Driven Development Actually Means

3. The Four-Phase Workflow

4. What Goes in the Spec: Three Document Types

5. OpenAPI First: The API Contract as Spec

6. ADRs as Architectural Guardrails for Agents

7. Prompting Against the Spec: What Changes

8. The Tooling Landscape in 2026

9. The Honest Objections

10. What We Have Learned

Thank you!

Related Articles

Thank you!