RAG vs Graph RAG
A detailed architecture of both RAG and Graph RAG.
In this article, we will discuss the following topics :
- The Architecture of RAG and Graph RAG
- Which is Better
- Cost
The Architecture of RAG and Graph RAG
RAG method architecture can be split into two different Phases an Indexing Phase and the querying Phase.
In the Indexing phase, the preprocessing of unstructured text and the storing in vector store will be done
In the Query Phase we will have the User Query and the embedding model will convert the query to embedding and similar contents are retrieved and the LLM will frame an answer according to the question.
RAG
Retrieval-augmented generation (RAG) is a framework for improving the quality of LLM-generated responses by supporting the model on external sources of knowledge to supplement the LLM’s internal representation of information.
The main benefits of Implementing RAG in an LLM-based question-answering system. It ensures that the model has access to the most current, reliable facts and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.
Graph RAG
In the context of Retrieval-Augmented Generation (RAG), Graph RAG introduces a significant enhancement: transforming sourced document chunks into entities and relationships using a Large Language Model (LLM), preferably GPT-4. This preprocessing step is crucial, as accurate extraction of entities and their relationships is essential for the subsequent knowledge graph construction, which varies depending on the domain.
When we look closely into the architecture we can see that the indegested The architecture involves ingesting documents, splitting them into manageable chunks, and transforming these chunks into entities and relationships. These entities and relationships form the foundation of a knowledge graph. Leveraging an LLM, we identify the closest community for each node, thereby creating a hierarchical structure. This hierarchy allows the model to generate community-level summaries, which are then stored in a vector database.
When a user submits a query, it is processed to identify the most relevant community level. The system retrieves the summary from the highest-ranking community and refines the response using the LLM.
Which is better?
Both RAG (Retrieval-Augmented Generation) and Graph RAG have their own pros and cons. From several test cases I’ve reviewed, there is a notable difference in the responses produced by each approach.
The primary advantage of Graph RAG over traditional RAG is its ability to retrieve comprehensive details about the entities mentioned in the query. Graph RAG not only fetches detailed information about the queried entity but also identifies and relates it to other connected entities. In contrast, standard RAG retrieves information limited to the specific document chunk, missing out on broader relationships and connections.
Cost
The enhanced capability of Graph RAG comes with its own challenges. In my experiment, I ingested a file of approximately 83,000 tokens that needed to be chunked and embedded. Using the standard RAG approach, the embeddings were created with roughly the same number of tokens. When I ingested the same file using Graph RAG, the process involved extensive prompting and processing, resulting in around 1,000,000 tokens — nearly 12 times the original token count for a single file.
So these are some of the questions to my readers:
- Is Graph RAG worth the hype?
- Will it be able to perform well with smaller models like GPT-3.5?
- Can we achieve comparable performance through better prompt engineering, chunking, and embedding strategies?