Why Google’s Remy leaks have enterprise architects rethinking the AI stack

Google's reported Remy agent could reshape enterprise AI infrastructure, with experts warning of new workflow, runtime, and security challenges ahead.

May 18th, 2026 3:22pm by Adrian Bridgwater

Featued image for: Why Google’s Remy leaks have enterprise architects rethinking the AI stack

Paris Bilal for Unsplash+

Google’s reported development of Remy, a new OpenClaw-style agent that can perform actions on a user’s behalf, has lit up discussion boards and newswires since Business Insider first reported on its existence this month.

Unconfirmed reports cited by the publication suggest that Google is testing Remy inside a staff-only version of Gemini. According to an internal document seen by Business Insider, Google can integrate with a variety of its other services — but Google itself has reportedly declined to comment on the existence of Remy.

“Remy is your 24/7 personal agent for work, school, and daily life, powered by Gemini. It elevates the Gemini app into a true assistant that can take actions on your behalf — not just answer questions or generate content,” reports the publication.

Google already offers Gemini Agent, a consumer-level service that it lauds as the “next step in building towards a universal AI assistant,” with features including live web browsing, deep research, and integration with some of Google’s apps to execute actions once the agent receives user confirmation.

If Remy is real, it could represent a significant step forward toward more orchestrated AI services that integrate into human workflows, rather than occasional chat-based adjuncts.

Google DeepMind CEO Hassabis: “Two big breakthroughs needed for AGI”

Google DeepMind CEO Demis Hassabis has spoken widely on his long-term ambitions for artificial general intelligence, and on subjects as diverse as renewable energy and global water access. He has also alluded to the need for AI models to make wider contextual connections to herald the kind of work Remy may deliver.

In an interview in January during the 2026 World Economic Forum in Davos, Hassasbis said he is, “Definitely a subscriber to the idea that maybe we need one or two more big breakthroughs before we’ll get to AGI. I think they’re along the lines of things like continual learning, better memory, longer context windows (or perhaps more efficient context windows would be the right way to say it)… so don’t store everything, just store the important things.”

“The next evolution of the AI stack will be extending agent frameworks with durable workflow and orchestration primitives rather than treating agents as isolated prompt-response systems.”
—Yaron Schneider, CTO, Diagrid.

AI enters the workflow

Yaron Schneider, CTO and co-founder of the agentic development company Diagrid, tells The New Stack that Google’s Remy project reinforces the idea that the future of AI is long-running autonomous agent workflows, not single prompts.

But he advises that once agents begin coordinating tools and actions over time, we quickly reach a point where reliability, recovery, and governance become workflow problems, which means durable execution becomes foundational to the agent stack.

“For developers, this changes how AI systems are built: autonomous agents increasingly need workflow runtimes underneath them to coordinate state, retries, recovery, identity, and policy enforcement across long-running execution,” Schneider says. “The next evolution of the AI stack will be extending agent frameworks with durable workflow and orchestration primitives rather than treating agents as isolated prompt-response systems.”

Infrastructure first, always

Devin Cheevers, director of product at Grafana Labs, tells The New Stack that Remy (or any real manifestation of a product of this kind) isn’t a chatbot; it’s a “long-running personal agent”, and that has implications.

He says that shipping a technology of this type at Google’s scale forces the search and cloud giant to build agent runtime infrastructure beneath it and grapple with new challenges that come with building agentic systems.

“Once you move from a synchronous request-response execution pattern to continuously running delegated execution, you stop building an AI app and start building a distributed system,” Cheevers says. “The interesting signal in the Remy leaks isn’t the model, it’s the language around persistence and proactivity, i.e., terms like ‘monitor for things that matter to you’ and ‘handle tasks over time’ imply durable execution graphs, long-lived state, asynchronous orchestration, and delegated permissions across Android, Chrome, Workspace, Search, and identity systems.”

Distributed systems, hard problems

An observability platform specialist at heart, Cheevers highlights what he calls the “hard problems” that distributed systems have always had: retries (automated mechanisms that allow an application to reattempt a failed operation), partial failure, scheduling, state consistency, auth propagation, replayability, isolation, policy enforcement, and observability itself.

“Remy isn’t a product story; it’s a runtime story. This agent signals a “structural shift” in enterprise AI architecture that boards and risk committees need to understand.” – Seth Rogers, Kyndryl.

Associate director for customer technology advisory at Kyndryl, Seth Rogers, agrees that Remy isn’t a product story; it’s a runtime story. Rogers tells The New Stack that the mooted emergence of Google’s Remy agent signals a “structural shift” in enterprise AI architecture that boards and risk committees need to understand.

“Model-level safety controls the alignment training and content filters that vendors have relied on to date are statistical in nature and cannot meet the deterministic assurance bar that regulated industries require,” Rogers says. “As agents like Remy move toward continuous, autonomous operation across sensitive systems, the residual error rate inherent to probabilistic controls translates directly into material incident exposure.”

A duality of technologies & pressure points

He further explains that the market is responding with two complementary technologies now moving from research into early production: deterministic policy engines that govern every action an agent takes, against declarative, auditable rules, and hardened runtime containment that isolates the agent at the operating-system level.

“Two pressures are accelerating adoption. First, the expanding autonomy of agents – Remy being the visible consumer case – removes the human-in-the-loop checkpoints that today’s permission models rely on. Second, AI-assisted vulnerability discovery, exemplified by Anthropic’s Mythos capability, is compressing patch cycles from months to days and rendering traditional incident response cadences inadequate,” clarifies Rogers.

Banking, healthcare, and other regulated sectors are responding by adopting what Rogers defines as “military-grade containment” for the critical percentage of their estate, where a single non-compliant action from a more fully autonomous agentic action that impacts secure workflow execution constitutes a regulatory event.

For Kyndryl’s Rogers, the strategic question for any organization deploying agents at scale is no longer whether to invest in this control layer, but how quickly the core estate can be migrated onto it.

Remy, out of the cellar

If Remy emerges from the cellar and we look at technologies like Nvidia NemoClaw (an open source stack that adds privacy and security controls to OpenClaw), with Google, Anthropic, and OpenAI’s corresponding efforts inevitably also surfacing, we may be able to get an indicator of a category that will now define enterprise AI infrastructure spending in the immediate future.

Adrian Bridgwater is a technology journalist with three decades of press experience. He has an extensive background in communications, starting in print media, newspapers and also television. Primarily working as an analysis writer dedicated to a software application development ‘beat’,...