Anthropic’s Claude Sonnet 4 Model Gets a 1M Token Context Window

Anthropic today updated its Claude Sonnet 4 model to support a context window of up to 1 million tokens.

Aug 12th, 2025 9:00am by Frederic Lardinois

Featued image for: Anthropic’s Claude Sonnet 4 Model Gets a 1M Token Context Window

Anthropic today announced that its Claude Sonnet 4 model, the company’s mainstream model that sits below its flagship Claude Opus 4 model, will now support a 1 million token context window. This long context support is now in public beta and available through the Anthropic API and on Amazon Bedrock, with support on Google’s Vertex AI coming soon.

A million tokens is the rough equivalent of 750,000 words, allowing the model to reason over a large amount of data without the developers having to resort to more complex techniques like retrieval-augmented generation (RAG).

When Anthropic launched its latest generation of models in May, both Sonnet 4 and Opus 4 were restricted to a context window of 200,000 tokens. That’s enough context for many use cases, but as far back as early 2024, Google, for example, offered a 1 million token context window for its Gemini models, with the promise to make a 2 million token context window widely available soon. OpenAI followed suit earlier this year with the launch of GPT-4.1, which also supported a 1 million token context window (but then GPT-5 brought that down to 400,000 tokens again).

There’s been no word on when (or if) Opus 4 will get the same upgrade.

As Anthropic notes in today’s announcement, long context will allow the models to evaluate more of a given code base, for example, (and coding is where Claude has long excelled), synthesize larger document sets and build AI agents that can maintain context even after hundreds of tool calls.

All of this comes at a price, though, with prompts that exceed the old 200,000 token limit costing twice as much per 1 million input tokens ($6 vs. $3) and 50% more per 1 million output tokens. Anthropic notes that prompt caching can help reduce cost (and latency) and stresses that its batch processing mode can also help bring the cost down by 50%.

It’s worth noting that there has been some discussion around how well large language models work with these extremely large context windows. Often, the benchmark for this is the needle-in-a-haystack test, which asks the model to find a specific piece of data in the context window. There, most models now perform quite well.

As some researchers have pointed out, though, that’s not necessarily how developers use these context windows in practice. Indeed, models often struggle to keep coherence as the session length — and with it, the context size — expands, for example.

Because of this, context engineering likely won’t be going away anytime soon, even as context windows increase in size.

Before joining The New Stack as its senior editor for AI, Frederic was the enterprise editor at TechCrunch, where he covered everything from the rise of the cloud and the earliest days of Kubernetes to the advent of quantum computing....