---
title: "Morph — Fast Models That Improve Coding Agents"
url: "https://www.morphllm.com/"
canonical_url: "https://www.morphllm.com/"
docs_url: "https://docs.morphllm.com"
description: "The model API layer for coding agents: fast general models for agent loops (Qwen 3.5 397B, MiniMax M2.7, DeepSeek V4 Flash), plus specialized models for search (WarpGrep), edits (Fast Apply), context (Compact), and semantic trace signals (Reflexes). One OpenAI-compatible API, MCP server, Vercel AI SDK provider."
---

# Morph

Morph is the model API layer for coding agents: fast general models for the primary agent loop, and specialized models for the sub-tasks general models do slower and pricier: search, edits, context, and semantic trace signals. Everything runs through one OpenAI-compatible API, served on Morph's custom GPU kernels. The throughput numbers on this site (10,500 tok/s for Fast Apply, 33,000 tok/s for Compact) come from that stack, not from fine-tuning a general-purpose serving layer.

Beyond the general models, Morph ships a specialized model for each place coding agents spend the most compute: **applying edits, searching code, compacting context, and verifying UI changes**.

- Canonical: https://www.morphllm.com
- Docs: https://docs.morphllm.com
- Agent context (full): https://docs.morphllm.com/llms-full.txt
- Quickstart: https://docs.morphllm.com/quickstart
- MCP setup: https://docs.morphllm.com/mcpquickstart

## Problem

General-purpose coding agents burn compute on sub-tasks that specialized models do faster and cheaper. Anthropic's multi-agent research system reported ~90% improvement over a single agent. Cognition measured coding agents spending ~60% of their time on search. Long-horizon agents hit a quality cliff at 95% context capacity. Morph's subagents and models slot into any coding agent to fix these.

## How we build it

- **Custom inference engines** — not vLLM/TGI wrappers. Purpose-built servers for the apply, search, and compact workloads, with batching, speculative decoding, and memory layouts tuned per task.
- **Custom GPU kernels** — hand-written CUDA / Triton kernels for the hot paths specific to code editing (long-context attention with code-shaped sparsity, tokenizer ops for code).
- **Small specialized models, trained end-to-end for a single task** — Fast Apply only applies, WarpGrep only searches, Compact only compacts, Reflexes only classify agent behavior. No general capability to amortize; all parameters serve the workload.
- **RL on real agent traces** — models are trained against the actual harnesses they run in (coding agents), not held-out benchmarks alone.

## Products

| Product | One line | Model | Docs |
|---------|----------|-------|------|
| [Fast Apply](https://www.morphllm.com/products/fastapply.md) | Merge LLM code edits at 10,500 tok/s, 98% accuracy | `morph-v3-fast`, `morph-v3-large` | [docs](https://docs.morphllm.com/sdk/components/fast-apply) |
| [WarpGrep](https://www.morphllm.com/products/warpgrep.md) | Code search subagent; 0.73 F1 in 3.8 steps; #1 on SWE-Bench Pro | `morph-warp-grep-v2.1` | [docs](https://docs.morphllm.com/sdk/components/warp-grep/index) |
| [Compact](https://www.morphllm.com/products/compact.md) | Context compaction at 33,000 tok/s; byte-identical, not summarization | `morph-compactor` | [docs](https://docs.morphllm.com/sdk/components/compact) |
| [Reflexes](https://www.morphllm.com/products/reflex.md) | Semantic classifiers for traces, evals, and online learning signals | `morph-reflex-*` | [docs](https://docs.morphllm.com/sdk/components/reflexes) |
| [Glance](https://www.morphllm.com/products/glance.md) | AI browser/mobile testing on PRs; 10x cheaper than general-purpose | `morph-computer-use-v0` | [docs](https://docs.morphllm.com/sdk/components/glance) |

Supporting models: [Router](https://docs.morphllm.com/sdk/components/router) (automatic model selection), [Subagents](https://docs.morphllm.com/sdk/components/subagents) (autonomous codebase exploration), [Embeddings](https://docs.morphllm.com/models/embedding) and [Rerank](https://docs.morphllm.com/models/rerank) (legacy; prefer WarpGrep), [GenKit](https://docs.morphllm.com/sdk/genkit/index) (generative UI components).

## Quickstart

```ts
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.morphllm.com/v1",
  apiKey: process.env.MORPH_API_KEY,
});

// Fast Apply — merge a lazy edit into an original file
const response = await client.chat.completions.create({
  model: "morph-v3-fast",
  messages: [{
    role: "user",
    content: `<code>${originalFile}</code>\n<update>${lazyEdit}</update>`,
  }],
});
```

Full quickstart: https://docs.morphllm.com/quickstart

## Integrations

- **OpenAI-compatible API** — `https://api.morphllm.com/v1`
- **MCP server** — [setup guide](https://docs.morphllm.com/mcpquickstart) for Cursor, Claude Code, Windsurf, Cline, VS Code, Claude Desktop
- **Vercel AI SDK** — [`morph:morph-v3-fast`](https://docs.morphllm.com/guides/ai-sdk)
- **OpenRouter** — `morph/morph-v2`
- **TypeScript SDK** — `@morphllm/morphsdk` ([quickstart](https://docs.morphllm.com/sdk/quickstart))
- **Python** — `morphllm` via OpenAI-compatible client
- **GitHub App** — [one-click install](https://www.morphllm.com/dashboard/integrations/github) for Glance

## Pricing (per 1M tokens, usage-based, no per-seat fees)

| Product | Model | Input | Output |
|---------|-------|------:|-------:|
| Fast Apply (7B) | morph-v3-fast | 0.80 | 1.20 |
| Fast Apply (14B) | morph-v3-large | 0.90 | 1.90 |
| WarpGrep | morph-warp-grep-v2.1 | 0.80 | 0.80 |
| Compact | morph-compactor | 0.20 | 0.50 |
| Embeddings | morph-embedding-v4 | 0.18 | — |
| Rerank | morph-rerank-v4 | 0.10 | — |

Free tier: 200 requests/month. $10/month in free compute for WarpGrep and Glance. Full: https://www.morphllm.com/pricing.md

## Enterprise

- [Self-hosting](https://docs.morphllm.com/api-reference/self-hosting) — on-prem / air-gapped, SOC2-compliant
- [Enterprise Apply](https://docs.morphllm.com/api-reference/endpoint/enterprise) — custom model configurations
- [Enterprise overview](https://docs.morphllm.com/enterprise) — security, compliance, support

## See also

- [Agent context (LLM quickstart)](https://docs.morphllm.com/llm-quickstart) — ~9k tokens of full Morph context for a coding agent to ingest
- [Glossary](https://docs.morphllm.com/glossary) — key terms across Morph docs
- [Blog](https://www.morphllm.com/blog) — research and engineering posts
- [Benchmarks](https://www.morphllm.com/benchmarks) — SWE-Bench Pro, F1, accuracy
- [Contact](https://www.morphllm.com/contact)
- llms-full.txt (this site): https://www.morphllm.com/llms-full.txt