Anthropic released Claude Sonnet 4.5 today, positioning it as the best model available for real-world coding, agentic tasks and computer use. The company claims the model can operate autonomously for over 30 hours while maintaining consistent performance. This represents a significant improvement over the seven-hour capability of its predecessor, Opus 4.
The release comes roughly four months after Claude Sonnet 4, and Anthropic says the improvements go beyond benchmark scores. The focus here is practical utility: Better instruction following, stronger code refactoring and more production-ready output.
What’s Different About Sonnet 4.5
Claude Sonnet 4.5 leads on SWE-Bench Verified, a coding benchmark that tests how well models handle real-world software engineering tasks. But the model’s strengths extend into specialized domains.
In cybersecurity, it helps teams detect and remediate vulnerabilities faster. In financial services, it outperforms Opus 4.1 on research, modeling and forecasting tasks. And in legal workflows, Thomson Reuters reports that the model can analyze full litigation records and draft judicial opinions based on complex briefing cycles.
The extended autonomous operation window matters for teams building applications. Where earlier models would lose focus or require frequent intervention, Sonnet 4.5 can work through a full-stack build over multiple days without breaking stride. That consistency changes what’s possible when delegating complex architectural work to AI.
“Claude Sonnet 4.5 exemplifies how quickly we are shifting from AI as a coding assistant to AI as a reliable teammate that can own and complete entire streams of engineering and operations work,” according to Mitch Ashley, VP and practice lead, software lifecycle engineering at The Futurum Group. “Claude Sonnet 4.5 can now plan and execute multi-step tasks, including refactoring large codebases, navigating tools without APIs, or running complex security and data-analysis workflows without constant human intervention.”
Developer Tools and Infrastructure Updates
Alongside the model, Anthropic is shipping several updates aimed at developers building their own agents and workflows.
Claude Code receives a refreshed VS Code extension and terminal interface, featuring new checkpoint features. This allows developers to use Claude directly from their development environment, providing better state management.
The Claude Agent SDK provides teams with access to the same building blocks that power Claude Code — tools, context management and permissions frameworks. The idea is to enable companies to build custom agents without having to start from scratch.
Claude Developer Platform now supports longer agent runs through automatic context clearing and a new memory tool. This addresses a common pain point: Context window exhaustion during extended tasks.
Claude App can now execute code, create files, and analyze data. The feature is available across all paid plans and gets faster with Sonnet 4.5.
Early Customer Response
Companies using the model report measurable improvements over previous versions.
Replit saw its code editing error rate drop from 9% with Sonnet 4 to 0% with Sonnet 4.5 on internal benchmarks. Michele Catasta, President at Replit, noted that the model “balances creativity and control perfectly, thoroughly completing tasks without over-engineering.”
HackerOne reduced average vulnerability intake time by 44% while improving accuracy by 25% when using Sonnet 4.5 for security agents.
Netflix’s Eric Wendelin described the model as “excellent at software development tasks, learning our codebase patterns to deliver precise implementations.”
For Cursor, the improvements were most noticeable in longer-horizon tasks — those that require maintaining context and coherence across multiple files and systems.
Computer Use Capabilities
Anthropic introduced computer use capabilities about a year ago. With Sonnet 4.5, the company says Claude now leads the market in this area.
Computer use means the model can interact with applications and interfaces in the same way a human would: By clicking, typing and navigating menus. For DevOps teams, this opens up automation possibilities that go beyond traditional scripting. The model can work with tools that don’t have APIs or require complex visual navigation.
The practical application here is reducing manual overhead. Tasks that require jumping between systems or tools with different interfaces can be delegated to Claude, freeing up time for work that requires human judgment and expertise.
Industry-Specific Performance
The model shows particular strength in domains with complex requirements.
Norway’s sovereign wealth fund (NBIM) highlighted the combination of improved intelligence and direct integration with its tools and systems. Stian Kirkeberg, Head of AI and Machine Learning, called this “particularly powerful for our investment operations.”
Shopify’s Paulo Arruda described Sonnet 4.5 as “a powerful thinking partner: It builds detailed plans, grasps intent beyond words, and incorporates feedback thoughtfully.”
These use cases suggest the model handles context-heavy work better than previous versions; understanding not just what to do, but why, and adapting as requirements evolve.
What This Means for DevOps Teams
The 30-hour autonomous operation window changes the calculus for what you can delegate. Complex migrations, large-scale refactoring and multiservice updates become more feasible when you can rely on consistent performance over extended periods.
The combination of better coding ability, computer use skills and extended focus means fewer interruptions and less babysitting. That matters when you’re trying to ship faster without sacrificing quality.
And the new developer tools lower the barrier to building custom agents. If you need AI that understands your specific stack, workflows and constraints, you can now build that without extensive ML expertise.
Claude Sonnet 4.5 is available now through Anthropic’s API, Claude Code and the Claude app.

