Rabbit: Building the New Human to Machine Interface

11 min readJan 9, 2024

Amidst all the AI hype we think it’s important to consider from a first principles approach what is changing, and where disruption will be most felt. Computer systems to date have been deterministic, relying on rules based logic and relational databases combined with human oversight to drive automation. For the first time, LLMs are able to utilize open-ended (non- deterministic) input via natural language, infer a user’s intention, and then generate traditional outputs. We believe that LLMs and this non deterministic approach to computing will unlock cognitive skills across countless verticals bringing unprecedented scalability to human reasoning primarily via the form of agents. These agents won’t be able to replace human cognitive skills right away, but over time we see them taking over a huge part of the service economy — which today is over 70% of GDP in countries like the United States. 1. The opportunity set for disruptive new businesses is massive, Synergis wants to back AI native companies that simply couldn’t exist in the rigid computing world of the past. Rabbit Tech is doing exactly that, by offering a chance to reimagine what application interfaces look like.

Rabbit is creating an entirely new operating system, Rabbit OS, that utilizes natural language processing and an innovative foundational Large Action Model to fundamentally change how we use applications. AI will become the new primary interface with Rabbit the ultimate consumer agent, replacing the need for users to interact with the front end of any application. Trained agents — called ‘Rabbits’ will seamlessly replicate human interaction with popular applications, and execute complicated multi step actions. From booking a hotel to ordering dinner, users will no longer need to open individual apps, a Rabbit will simply fulfill the request as desired. Rabbit is led by Jesse Lyu, a serial founder who has been building in this space for a decade, and backed by Khosla Ventures (first institutional investor of OpenAI), the team is taking a full proprietary stack approach creating a new hardware device that will feature its innovative operating system. At Synergis, we truly believe that if we want to deliver the best next generation software experience, it has to be running on our own tailormade hardware.

‘To do any task on a computer, you have to tell your device which apps to use. You can use Microsoft Word and Google Docs to draft a business proposal, but they can’t help you send an email, share a selfie, analyze data, schedule a party, or buy movie tickets. In the next five years, this will change completely. You won’t have to use different apps for different tasks. You’ll simply tell your device, in everyday language, what you want to do.” — Bill Gates 2

Why we need a new operating system

Today’s most popular operating systems from Windows, to iOS and Android share one common theme, they were built to fit within the parameter limitations of desktops, phones, and tablets. This means rigid command structure and processes with a heavy emphasis on touch focused UI/UX to facilitate interaction between system and user. While UI/UX has come a long way, we believe a text based experience is simply not as natural and easy to operate than one based on voice. Existing voice based assistants like Siri have been implemented, but still operate under a strict rules based system, effectively limiting their usefulness and ability to handle complex requests. Simple requests like “How is the weather outside?” or “Who won the sports game?” don’t typically offer enough value to move away from text based usage. As Rabbit’s Large Action Model (LAM) unlocks the ability to execute more complex orders with multiple steps, we believe the efficiency gains of switching to voice will become more obvious and will quickly lead to a change in operating behavior favoring natural language operating systems.

The popular app store experience featured in today’s operating systems also needs to be reimagined in an AI first world. In the United States, the average mobile phone has 80 separate applications — users visit 30 of them on a monthly basis and only 9 on a daily frequency. For younger generations this trend towards application use is only growing with Gen Z now spending 112 hours a month on applications, 10 hours more than prior age groups.3 Given the siloed nature of this mass of applications, user info and behavior patterns are not seamlessly carried between each of them, and must be repeated continuously to the detriment of efficiency. Rabbit will solve this chaos by making it easier for apps to share data and talk to each other via agents, allowing increased ability to coordinate actions across multiple venues. Perhaps most importantly is the challenge to the existing aggregator model itself. Search engines like Google have pre existing ad relationships that thrive on open competition for user attention, this creates complexities as a simple query like “Book a Hotel” would show results from Expedia, Booking.com, etc, whereas the user simply wants the end result executed.

Less noticeable to the average user, but critically important for developers is the current reliance on API call data in almost all of today’s applications. Developers are forced to manually connect APIs on top of APIs to give users the best overall experience. From a business point of view the API route can become a huge operational burden in maintaining and developing an application. API dependent applications also put limitations on the end user experience as it will typically be inferior to their native application experience. As LLMs continue to proliferate, manual integration into rigid APIs will serve as one of the largest constraints for training and rapid deployment of AI agents into the real world. Rabbit’s ability to mimic human intention, effectively bypassing the need for API calls will allow it to scale rapidly and entirely bypass many of the bottlenecks some of its competitors may face.

How Rabbit builds an app-less future?

LAM

Large Action Model (LAM) is a new foundation model that understands human intentions on computers with neuro-symbolic techniques.

“Large Action Model,” or LAM, models human intentions expressed through actions on computers and, by extension, in the physical world. A key observation is that the inherent structures of human-computer interactions differ from natural language or vision. The applications are expressed in a form that is more structured than a rasterized image and more verbose and noisy than a sentence or a paragraph. The characteristics one desires from a LAM are also different from a foundation model that understands language or vision alone: while we may want an intelligent chatbot to be creative, LAM-learned actions on applications should be highly regular, minimalistic (per Occam’s razor), stable, and explainable.

With these fresh perspectives Rabbit’s team has developed unique formulations and models that are surprisingly effective on the benchmarks we care about. The stack is designed from the ground up, from the data collection platform to a new network architecture that utilizes both transformer-style attention and graph-based message passing, combined with symbolic algorithms.

LAM’s modeling approach is rooted in imitation, or learning by demonstration: it observes a human using the interface and aims to reliably replicate the process, even if the interface is presented differently or slightly changed. Instead of having a black-box model uncontrollably outputting actions and adapting to the application during inference, LAM’s “recipe” is more observable. This means that once the demonstration is provided, the synthesized routine runs directly on the target application without the need for a busy loop of “observation” or “thoughts,” and any technically trained human should be able to inspect the “recipe” and reason about its inner workings. Both symbolic and neural components contribute to this process: neural networks are used to understand language, vision, and perform zero-shot reasoning; symbolic algorithms are employed to extract salient substructures and propose action sequences on formalized representation of target applications. As LAM accumulates knowledge from demonstrations over time, it gains a deep understanding of every aspect of an interface exposed by an application and creates a “conceptual blueprint” of the underlying service provided by the application. LAM can be seen as a bridge, connecting users to these services through the application’s interface.

In the long run, LAM will exhibits its own version of “scaling laws,” where the actions it learns can generalize to applications of all kinds, even generative ones. As Rabbit invests in more computational power, LAM could become increasingly helpful in solving complex problems spanning multiple apps that require professional skills to operate.

By utilizing neuro-symbolic techniques in the loop, LAM sits on the very frontier of inter-disciplinary scientific research in language modeling (LM), programming languages (PL), and formal methods (FM). Traditionally, the PL/FM community has focused on symbolic techniques — solver technologies that rely on logical principles of induction, deduction, and heuristic search. While these symbolic techniques can be highly explainable and come with strong guarantees, they suffer from a scalability limit. By contrast, recent innovations in the LM community are grounded in machine learning and neural techniques: while highly scalable, they suffer from a lack of explainability and come with no guarantees of the output produced. Inspired by the success of machine learning and neural techniques, the PL/FM community has recently made waves of progress on neuro-symbolic methods: by putting together neural techniques (such as LLM) and symbolic ones, one ends up combining the best parts of both worlds, making the task of creating scalable and explainable learning agents a feasible one. Yet to date, no one has put cutting-edge neuro-symbolic techniques into production — LAM seeks to pioneer this direction.

(A snapshot on LAM being trained on every popular APPs)

Rabbit OS

Rabbit OS is a first of its kind operating system built using natural language processing on LAM. The LAM will facilitate the training of individual agents or ‘Rabbits’ that are able to complete a variety of tasks for the user via verbal or text command. Each ‘Rabbit’ is able to mimic human interactions, not relying on API calls, and enter various applications on behalf of the user to accomplish the required tasks. If instructing Rabbit to book a trip to Las Vegas for CES, the agent will be able to simultaneously check for the best hotel deals across multiple applications like Expedia, Hotels.com, book a flight, and with knowledge of flight time and hotel, also arrange for an Uber for airport pickup. This cuts out the need to utilize multiple apps and coordinate across all of them — leading to massive time savings and efficiency gains. As Rabbit learns more on one’s personal preferences — say hotels with a view or pool — these learnings will be incorporated into future interactions automatically. Rabbit will be constantly learning with each experience becoming more and more natural to users over time. In this world Rabbit becomes the ultimate interface, with various applications simply becoming a backend service with unique front ends no longer needed. The implications of this app-less future are massive, offering the first real disruption to the dominant App Store model of the last decade, as well as forcing apps to explore different paths of monetizations.

A new hardware device

To this point we’ve talked a lot about the unique operating system on the software side that Rabbit OS will deliver, but equally as important is the means through which it is brought into the world. rabbit’s flagship device will be built to primarily utilize PTT (push to talk). The first model is called r1, which was just launched on January 9th. This mechanism was chosen for both privacy and practical reasons. Having a device always listening creates clear privacy concerns and potential backlash from regulators. Current battery tech does not support always listening for more than a day, and the computational power needed to run the algo is also not practical. The device will feature a motorized camera allowing versatility for front, back, and down facing use cases. The screen will offer touch based typing for situations when speaking is not appropriate. The overall design is meant to minimize interactions with machine UI, and keep the flow more natural and efficient. We believe this new experience will be vastly improved compared to today’s multi touch world, as information is really on it for display only, not needed for complex interactions.

Team

Rabbit is an extremely ambitious project, seeking to disrupt one of the most popular of current human behaviors — application based operating systems. Rabbit will have unique challenges on both the software side, as well as around hardware design and distribution. Given all of this, the founding team is key here, and we believe that CEO Jesse Lyu is the perfect candidate to deliver for Rabbit.

Jesse Lyu, Founder & CEO: Jesse was born in X’ian, China in 1990 and graduated from Liverpool University in the UK. A serial entrepreneur, Jessie has had a lifelong passion for Natural Language Processing well before the current Generative AI mania. Jesse’s first AI company Raventech launched out of YC, focusing on NLP for home devices — the project was acquired by Baidu in 2017 for $100m. Jessie worked at Baidu leading the build out and launch of their version of Amazon Alexa for local Chinese markets for the next few years. This experience gave him unique insight into the design space and manufacturing process for hardware devices and how these devices intersect with AI models on a mass scale (over 300m users). Jessie moved back to Los Angeles after leaving Baidu, and started Cyber Manufacturing, which is now Rabbit in 2021.

Closing

The shift from rigid deterministic computer systems to natural language based systems will represent one of the most important changes in how humans interact with machines. Rabbit offers to deliver the ultimate consumer agent, a powerful tool that will leverage voice commands to complete complex tasks, we believe this is a 100x efficiency gain. We believe the team is poised to forever change how we interact with our devices, and bring AI agents to the masses. Synergis has backed the Rabbit team for over two years now and been a steadfast supporter, adding to our conviction at each round of financing. We believe Rabbit will become a generational product that changes how humans interact with technology on a daily basis.

To learn more, visit https://www.rabbit.tech/

Website: https://synergiscap.xyz/

Twitter: https://twitter.com/Synergiscap

Rabbit: Building the New Human to Machine Interface

Written by Synergis Capital