Semantic Router and Its Role in Designing Agentic Workflows

A semantic router is a pattern that enables AI agents to choose the right LLM for the right task, while also reducing their LLM dependency.

Sep 25th, 2024 10:17am by Janakiram MSV

Featued image for: Semantic Router and Its Role in Designing Agentic Workflows

Image via Unsplash+.

The emerging pattern of agentic workflows heavily relies on LLMs to perform reasoning and decision-making. Each agent calls an LLM multiple times during task execution. With a workflow consisting of multiple agents, the number of calls increases exponentially, leading to both cost and latency.

There are various language models with different features and abilities, such as small language models, multimodal models, and purpose-built task-specific models. Agents can use these models to finish a workflow. This results in a decrease in cost and latency, as well as an increase in overall accuracy.

A semantic router is a pattern that enables agents to choose the right language model for the right task while also reducing their dependency on the models through local decision-making. Behind the scenes, the semantic router uses embeddings stored in a vector database to match the prompt with a set of existing phrases (also known as utterances) to map them to a specific route. The route can be an LLM that’s best suited for the task. Because a semantic search determines the target, we call it a semantic router.

The semantic router uses the same technique as the retriever in a RAG pipeline to perform a semantic search to find the right match. But instead of chunks of text, it returns a single, pre-defined route based on the input.

Although implementing a semantic router as a custom layer between the agents and the LLMs is technically possible, the open source Semantic Router project is gaining popularity.

Overview of the Semantic Router Project

Aurelio AI developed Semantic Router, an innovative open source tool that transforms decision-making in AI-based agents. This layer enhances what LLMs and agents can do by utilizing semantic vector space to route requests more efficiently. Unlike traditional methods that rely on slow LLM generations for tool-use decisions, Semantic Router utilizes the power of semantic meaning to make rapid and accurate choices.

The project offers seamless integration with various embedding models, including popular options like Cohere and OpenAI, as well as support for open source models via HuggingFace Encoders. The project utilizes an internal in-memory vector database, but mainstream vector database engines like Pinecone and Qdrant can easily replace it. The Semantic Router’s ability to make decisions based on user queries significantly reduces processing time, typically from 5000 ms to just 100 ms.

With its MIT license, Semantic Router is extensible, allowing developers to incorporate it into their projects freely. This tool addresses critical challenges in AI development — including safety, scalability, and speed — making it an invaluable asset for creating more efficient and responsive agentic workflows.

Key Components of Semantic Router

Routes and Utterances

Routes form the backbone of the Semantic Router’s decision-making process. Each route represents a potential decision or action and is defined by a set of utterances, which are sample inputs that map to a particular route. The system feeds these utterances into a semantic profile for each route. We compare new inputs to these utterances to find the closest match.

In practice, this allows the system to categorize and respond to inputs based on their semantic meaning rather than relying on LLM generation, which can be slow or prone to error. Developers can customize routes to fit specific applications — whether that’s filtering sensitive topics, managing APIs, or orchestrating tools in a complex workflow.

Encoders and Vector Spaces

To compare inputs with predefined utterances, the Semantic Router uses encoders that transform text into high-dimensional vectors. These vectors reside in a semantic space, where the distance between vectors reflects the semantic similarity of the corresponding texts. The shorter the distance, the more semantically related the inputs are.

Semantic Router supports multiple encoding methods, including Cohere and OpenAI encoders for high-performance API-driven workflows and Hugging Face models for those seeking open source, locally executable alternatives. The flexibility to choose different encoders allows developers to tailor the system to their specific infrastructure — balancing performance, cost, and privacy concerns.

Decision Layers

Once the inputs are encoded and compared to the predefined routes, the Semantic Router makes decisions using RouteLayers. This layer aggregates routes and embeddings, as well as manages the decision-making process. It also supports hybrid routing, where the system can combine local and cloud-based models to optimize performance.

Local LLM Integration

For developers who want to maintain full control over their LLMs or reduce dependency on external APIs, Semantic Router offers support for local models via LlamaCPP and Hugging Face models. Consumer hardware, such as a MacBook running Apple’s Metal hardware acceleration or Microsoft Copilot+ PC, can completely execute routing decisions and LLM-driven responses. This local execution model not only reduces latency and costs but also improves privacy and security.

Scalability

Scalability becomes a concern when adding more tools and agents to a workflow. LLMs have limited context windows, meaning they struggle to handle large amounts of data or context. The semantic router addresses this by decoupling decision-making from LLMs, allowing it to handle thousands of tools simultaneously without overloading the system. This separation of concerns allows agents to scale without sacrificing performance or accuracy.

Use Cases and Scenarios

Agentic AI use cases, which require simultaneous management of multiple tools, APIs, or datasets, are particularly well-suited for Semantic Router. In a typical workflow, the router can rapidly determine which tool or API to use based on the input, bypassing the need for full LLM queries. This is especially useful in virtual assistant systems, content generation workflows, and large-scale data processing pipelines.

For instance, in a virtual assistant, Semantic Router can efficiently route prompts like “schedule a meeting” or “check the weather” to the appropriate API or tool, without involving the LLM for every decision. Similarly, the request can be routed to a fine-tuned LLM meant to respond to medical or legal terminology. This not only reduces latency but also ensures a consistent, reliable experience for users.

The Semantic Router can be used to assess whether the prompt should be sent directly to a small language model operating locally, or whether it has to be mapped to a function and its parameters by invoking a capable LLM running in the cloud. This is particularly relevant in the implementation of federated language models that take advantage of both cloud-based and local language models.

In the era of agentic workflows, the need for efficient, scalable, and deterministic decision-making systems is more pressing than ever. Semantic Router provides a robust solution by leveraging the power of semantic vector spaces to make fast, reliable decisions, while still allowing integration with LLMs when needed. Its flexibility, speed, and deterministic nature make it an indispensable tool for developers looking to build next-generation AI systems.

As LLMs evolve and diversify, tools like Semantic Router will be crucial for making sure that agentic systems can perform, scale, and give consistent results. This will help developers find new ways to use AI in their workflows.

In the next part of this series, I will walk you through the steps involved in implementing a RAG agent based on the Semantic Router. Stay tuned.

Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...