• Messages
  • Managed Agents
  • Admin

Search...
⌘K
First steps
Intro to ClaudeQuickstart
Building with Claude
Features overviewUsing the Messages APIStop reasons and fallbackRefusals and fallbackFallback credit
Model capabilities
Extended thinkingAdaptive thinkingEffortTask budgets (beta)Fast mode (research preview)Structured outputsCitationsStreaming MessagesBatch processingSearch resultsStreaming refusalsMultilingual supportEmbeddings
Tools
OverviewHow tool use worksTutorial: Build a tool-using agentDefine toolsHandle tool callsParallel tool useTool Runner (SDK)Strict tool useTool use with prompt cachingServer toolsTroubleshootingWeb search toolWeb fetch toolCode execution toolAdvisor toolMemory toolBash toolComputer use toolText editor tool
Tool infrastructure
Tool referenceManage tool contextTool combinationsTool searchProgrammatic tool callingFine-grained tool streaming
Context management
Context windowsCompactionContext editingPrompt cachingMid-conversation system messagesBuild an orchestration modeCache diagnostics (beta)Token counting
Working with files
Files APIPDF supportImages and vision
Skills
OverviewQuickstartBest practicesSkills for enterpriseSkills in the API
MCP
Remote MCP serversMCP connector
Claude on cloud platforms
Amazon BedrockAmazon Bedrock (legacy)Claude Platform on AWSMicrosoft FoundryVertex AI

Log in
Fast mode (research preview)
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Solutions

  • AI agents
  • Code modernization
  • Coding
  • Customer support
  • Education
  • Financial services
  • Government
  • Life sciences

Partners

  • Amazon Bedrock
  • Google Cloud's Vertex AI

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Company

  • Anthropic
  • Careers
  • Economic Futures
  • Research
  • News
  • Responsible Scaling Policy
  • Security and compliance
  • Transparency

Learn

  • Blog
  • Courses
  • Use cases
  • Connectors
  • Customer stories
  • Engineering at Anthropic
  • Events
  • Powered by Claude
  • Service partners
  • Startups program

Help and security

  • Availability
  • Status
  • Support
  • Discord

Terms and policies

  • Privacy policy
  • Responsible disclosure policy
  • Terms of service: Commercial
  • Terms of service: Consumer
  • Usage policy
Messages/Model capabilities

Fast mode (research preview)

Higher output speed for supported Claude Opus models, delivering significantly faster token generation for latency-sensitive and agentic workflows.

Was this page helpful?

  • Supported models
  • How fast mode works
  • Basic usage
  • Pricing
  • Rate limits
  • Checking which speed was used
  • Retries and fallback
  • Automatic retries
  • Falling back to standard speed
  • Considerations
  • Next steps

Fast mode provides significantly faster output token generation for Claude Opus 4.8, Claude Opus 4.7, and Claude Opus 4.6 at premium pricing. Set speed: "fast" in your API request to opt in. Fast mode delivers up to 2.5x higher output tokens per second from the same model.



Fast mode is in research preview. Contact your account manager to request access. If you do not have an account manager, join the waitlist for fast mode.



This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

Supported models

Fast mode is supported on the following models:

  • Claude Opus 4.8 (claude-opus-4-8)
  • Claude Opus 4.7 (claude-opus-4-7)
  • Claude Opus 4.6 (claude-opus-4-6)


Fast mode for Claude Opus 4.8 launches as a research preview on the Claude API, including Claude Managed Agents, only. It is not available on third-party platforms, including Vertex AI, Amazon Bedrock, and Microsoft Foundry.



Fast mode for Claude Opus 4.6 is deprecated as of the Claude Opus 4.8 launch and will be removed approximately 30 days later. After removal, requests to claude-opus-4-6 with speed: "fast" will fall back to standard speed at standard pricing rather than return an error. Migrate to fast mode for Claude Opus 4.8 or Claude Opus 4.7 to keep the speedup.

How fast mode works

Fast mode runs the same model with a faster inference configuration. There is no change to intelligence or capabilities.

  • Up to 2.5x higher output tokens per second compared to standard speed
  • Speed benefits are focused on output tokens per second (OTPS), not time to first token (TTFT)
  • Same model weights and behavior (not a different model)

Basic usage

Pricing

Fast mode is priced at a per-model multiplier on standard rates across the full context window, including requests over 200k input tokens. The following table shows fast mode pricing for each supported model:

ModelInputOutput
Claude Opus 4.6 / Claude Opus 4.7$30 / MTok$150 / MTok
Claude Opus 4.8$10 / MTok$50 / MTok

Fast mode pricing stacks with other pricing modifiers:

  • Prompt caching multipliers apply on top of fast mode pricing
  • Data residency multipliers apply on top of fast mode pricing

For complete pricing details, see the pricing page.

Rate limits

Fast mode has a dedicated rate limit that is separate from standard Opus rate limits. When your fast mode rate limit is exceeded, the API returns a 429 error with a retry-after header indicating when capacity will be available.

The response includes headers that indicate your fast mode rate limit status:

HeaderDescription
anthropic-fast-input-tokens-limitMaximum fast mode input tokens per minute
anthropic-fast-input-tokens-remainingRemaining fast mode input tokens
anthropic-fast-input-tokens-resetTime when the fast mode input token limit resets
anthropic-fast-output-tokens-limitMaximum fast mode output tokens per minute
anthropic-fast-output-tokens-remainingRemaining fast mode output tokens
anthropic-fast-output-tokens-resetTime when the fast mode output token limit resets

For tier-specific rate limits, see the rate limits page.

Checking which speed was used

The response usage object includes a speed field that indicates which speed was used, either "fast" or "standard":

Output
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
// ...
  "usage": {
    "input_tokens": 8,
    "output_tokens": 12,
    "speed": "fast"
  }
}

To track fast mode usage and costs across your organization, see the Usage and Cost API.

Retries and fallback

Automatic retries

When fast mode rate limits are exceeded, the API returns a 429 error with a retry-after header. The Anthropic SDKs automatically retry these requests up to 2 times by default (configurable via max_retries), waiting for the server-specified delay before each retry. Since fast mode uses continuous token replenishment, the retry-after delay is typically short and requests succeed once capacity is available.

Falling back to standard speed

If you'd prefer to fall back to standard speed rather than wait for fast mode capacity, catch the rate limit error and retry without speed: "fast". Set max_retries to 0 on the initial fast request to skip automatic retries and fail immediately on rate limit errors.



Falling back from fast to standard speed will result in a prompt cache miss. Requests at different speeds do not share cached prefixes.

Since setting max_retries to 0 also disables retries for other transient errors (overloaded, internal server errors), the examples below re-issue the original request with default retries for those cases.

Considerations

  • Prompt caching: Switching between fast and standard speed invalidates the prompt cache. Requests at different speeds do not share cached prefixes.
  • Supported models: Fast mode is supported on Claude Opus 4.8, Claude Opus 4.7, and Claude Opus 4.6. Sending speed: "fast" with an unsupported model returns an error.
  • TTFT: Fast mode's benefits are focused on output tokens per second (OTPS), not time to first token (TTFT).
  • Batch API: Fast mode is not available with the Batch API.
  • Priority Tier: Fast mode is not available with Priority Tier.
  • Claude Platform on AWS: Fast mode is not currently available on Claude Platform on AWS.

Next steps

Pricing

View detailed fast mode pricing information.

Rate limits

Check rate limit tiers for fast mode.

Effort parameter

Control token usage with the effort parameter.

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[
        {"role": "user", "content": "Refactor this module to use dependency injection"}
    ],
)

print(response.content[0].text)
response = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.usage.speed)  # "fast" or "standard"
client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=0, max_attempts=3, **params):
    try:
        return client.with_options(max_retries=max_retries).beta.messages.create(
            **params
        )
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(max_retries=max_retries, **params)
        raise
    except (
        anthropic.APIStatusError,
        anthropic.APIConnectionError,
    ) as error:
        if isinstance(error, anthropic.APIStatusError) and error.status_code < 500:
            raise
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_retries=max_retries, max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)