AI coding agents have been able to write code for a while now. What they couldn’t do was use the software they wrote. They couldn’t open a browser, click a button, and watch what happens. That kept them in a narrow lane: generate a diff, hand it off, hope it works.
Cursor just removed that constraint. On February 24, the $29.3 billion AI coding startup launched cloud agents with computer use — autonomous agents that run in their own isolated virtual machines with full development environments. Each agent can build software, test it by navigating the UI in a browser, record video proof of its work, and produce a merge-ready pull request with artifacts attached.
The number that matters: 35% of Cursor’s internal merged pull requests are now created by these agents operating autonomously in cloud sandboxes. That’s not a demo metric. That’s production code shipping to millions of users.
What Changed
Local AI coding agents share your machine. They compete with each other and with you for resources. Run more than a couple in parallel, and things break.
Cloud agents eliminate that. Each agent gets its own VM, environment, and sandbox. “Instead of having one to three things that you’re doing at once, you can have 10 or 20 of these things running,” said Alexi Robbins, co-head of engineering for asynchronous agents at Cursor.
But the real breakthrough is computer use. Cloud agents can open browsers, navigate to localhost, click through UI elements, and verify that the code they wrote actually works. When they find a problem, they fix it and test again. When they finish, they record the session — video, screenshots, logs — and attach it all to the PR.
That changes the review workflow. Instead of reading a diff and mentally simulating whether it works, reviewers watch a 30-second video of the agent demonstrating the feature.
How Cursor Uses it Internally
Cursor has been dogfooding cloud agents for a month. Their use cases show what these agents handle well.
For feature development, they used a cloud agent to build source code links for the Cursor Marketplace. The agent implemented the feature, navigated to the imported Prisma plugin, clicked each component to verify the GitHub links worked, then rebased onto main, resolved merge conflicts, and squashed to a single commit.
For security, they kicked off a cloud agent from Slack to triage a clipboard exfiltration vulnerability. The agent built an exploit page, started a backend server, loaded it in Cursor’s browser, and recorded the complete attack flow. The summary appeared in the Slack thread.
For UI testing, they assigned an agent to validate cursor.com/docs. It spent 45 minutes doing a full walkthrough — sidebar, search, copy button, feedback dialog, table of contents, theme switching — and delivered a summary of everything it tested.
“Cursor’s cloud agents with computer use are a move towards developers no longer primarily writing code. They are engineering how AI agents plan, build, test, and deploy software. When 35% of production PRs come from autonomous agents operating in parallel cloud sandboxes, the execution obligation moves from authoring to directing and governing agent output,” per Mitch Ashley, VP and practice lead for software lifecycle engineering at The Futurum Group.
“The operational implications are important to recognize. CI/CD pipelines, review workflows, and governance frameworks must treat agents as first-class delivery actors. Platforms enabling developers to orchestrate agent-driven development at scale will define a new competitive divide in software engineering.”
The Competitive Context
The AI coding market is crowded. Anthropic’s Claude Code has crossed the $2.5 billion run rate in revenue. OpenAI’s Codex has more than 1.5 million weekly active users. GitHub Copilot passed 26 million users.
What sets Cursor apart is the self-verification loop. Claude Code and Codex generate code and run tests, but they don’t interact with the running application through a GUI. Cursor’s agents see what users see — catching visual regressions and UI bugs that pass every unit test.
Cursor’s internal data supports this: sandboxed cloud agents stop 40% less often than unsandboxed agents. When agents can test their own work before handing off, output quality improves measurably.
What This Means for DevOps Teams
Three implications stand out.
First, code review is becoming the bottleneck. When one developer can direct 10 to 20 parallel agents, the volume of PRs hitting the review queue multiplies. Video artifacts help — watching a 30-second demo is faster than reading 500 lines of diff — but teams will need to rethink their review processes.
Second, CI/CD pipelines need to account for agent-generated PRs. The agents produce merge-ready code, but “merge-ready” means the agent validated it in its sandbox. Your pipeline still needs to run its own tests and enforce your standards. Teams with strong CI benefit immediately. Teams without it will find that autonomous agents amplify existing gaps.
Third, the developer role is shifting. Cursor describes their vision as “self-driving codebases” — agents that merge PRs, manage rollouts, and monitor production. That’s aspirational, but the direction is clear. Developers are moving from writing code to directing agents and reviewing output.
Cloud agents are now available across Cursor’s web, desktop, and mobile interfaces, as well as Slack and GitHub. The roadmap focuses on coordinating multiple agents and building models that learn from past runs.
The shift from “developer uses agent to create diffs” to “agent ships tested features end-to-end” just got its clearest proof point. And 35% is just where Cursor is today.

