Vector Embeddings Explained: A Beginner’s Guide to Powerful AI

Product recommenders, smart chatbots and GenAI applications are powered by vector embeddings. Learn what they are and how to use them.

Sep 26th, 2024 10:30am by Denis Kuria

Featued image for: Vector Embeddings Explained: A Beginner’s Guide to Powerful AI

Featured image by Allison Saeng for Unsplash+.

Vector embeddings are numerical representations of data points within a high-dimensional space. This representation makes it easy to search through unstructured data such as text, images and videos, opening up many possibilities in AI and machine learning (ML) applications. Specialized vector databases like open source Milvus and its fully managed cloud version, Zilliz Cloud, are used to store and manage these embeddings. These databases use indexing mechanisms and advanced algorithms to perform efficient data retrieval based on a given query. These embeddings, along with vector databases, power various modern AI models, especially large language models (LLMs), by providing them with contextual information through semantic similarity search for more accurate and relevant responses. This technique is also known as retrieval augmented generation (RAG), which grounds LLM responses in factual data. As a result, it helps reduce hallucinations — where AI models produce incorrect or nonsensical information — and improves both the accuracy and reliability of AI-generated content.

Understanding Vector Embeddings

The real power of vector embeddings is their ability to capture relationships between different pieces of data. In the world of vector embeddings, data that is similar in meaning ends up closer together in this high-dimensional number space. This makes vector embeddings very useful for tasks that need to understand and compare data in advanced ways. For example, think about the words king, queen and royal. In a well-made embedding space, the number lists (vectors) for these words would be close to each other, showing that they’re related in meaning. On the other hand, the vector for an unrelated word like bicycle would be far away.

Vectors in a high-dimensional space

Types of Vector Embeddings

Different types of vector embeddings are commonly used in AI applications.The three main types of vector embeddings are:

Dense embeddings: These are the most common. In dense embeddings, most of the numbers in the list are non-zero. They’re good at capturing detailed relationships between data points, but they can use a lot of memory. Models like Word2Vec, GloVe and BERT usually create dense embeddings.
Sparse embeddings: In sparse embeddings, most of the numbers in the list are zero. They use less memory and can be useful for representing data with many dimensions, like how often words appear in large documents. Techniques like TF-IDF often create sparse embeddings.
Binary embeddings: These embeddings use only two values (usually 0 and 1) to represent data. They’re less precise but can be processed very quickly, which is useful when you need to retrieve data rapidly.

When choosing between these types, you need to balance factors like accuracy, storage needs and processing speed. For many applications, dense embeddings work best overall, but it’s important to consider what your specific project needs.

Creating Vector Embeddings

Vector embeddings are created using advanced deep-learning models and statistical techniques that learn patterns and relationships in the input data. These embeddings map data points into an n-dimensional space, capturing complex features and distinctions that would be difficult to represent in lower dimensions.

Understanding N-Dimensional Space

An n-dimensional space allows a richer representation of data, going beyond typical 3D understanding. High-dimensional embeddings can capture fine-grained details, which improves accuracy in tasks like search, recommendation and natural language processing (NLP). For instance, in a basic 2D space, words like tired and exhausted might appear similar, but in an n-dimensional space, the differences in their meanings, such as intensity or context, are more easily captured. A vector in this space is represented as: v = [v₁, v₂, ..., vₙ]

Techniques for Generating Embeddings

There are two main approaches to creating these embeddings:

Neural networks: These are complex models that can learn intricate patterns by adjusting weights between layers of interconnected nodes. For example, a popular model called BERT looks at words in context to create detailed text embeddings. Neural networks are powerful tools for making high-quality embeddings.
Matrix factorization: This is a simpler technique that works well for certain tasks, especially in recommendation systems. It can efficiently capture user preferences and item characteristics. It works by breaking down a large matrix (like a matrix of user-item interactions) into smaller matrices, creating embeddings for both users and items.

Once embeddings are created, storing, indexing and querying them efficiently becomes crucial. This is where specialized vector databases like Milvus come into play.

Storing, Indexing and Retrieving Vector Embeddings with Milvus

Milvus is an open source vector database designed specifically for handling large-scale vector data. It is the most popular vector database in terms of GitHub stars. Follow this guide to install Milvus on your computer as the code in the following sections will work only if Milvus is already running locally. The full code can be found here. Let’s walk through the process of using Milvus to work with vector embeddings.

Setting Up Milvus

First, you’ll need to install PyMilvus, the Milvus Python client that will help you interact with the Milvus database. The following command will install pymilvus[model] together with the Milvus client in your environment. Use this command instead of the one that installs the Milvus client directly, as later in this tutorial you will use a Milvus-integrated embedding model to generate your embeddings:

!pip install "pymilvus[model]"

The pymilvus[model] module in PyMilvus offers functionalities for working with Milvus-integrated embedding models.

Importing Dependencies and Setting Up Embedding Function

Then, import the necessary modules and set up the embedding model for generating text embeddings. This code imports the necessary modules and functions. It then initializes the BGEM3EmbeddingFunction with the BAAI/bge-m3 model. This is the model you will use to generate the text embeddings.

Connecting to Milvus and Creating Schema

After the initializations, connect to Milvus to start performing database operations. After making a connection, define the schema for the collection that will store the text embeddings. This connects to the Milvus instance running locally using a Uniform Resource Indentifier (URI). It then defines the structure of the collection, which has three fields:

id: A primary key field of type INT64.
text: A VARCHAR field for storing text values. This will store your documents.
embedding: A FLOAT_VECTOR field of dimension 1024 for storing vector embeddings.

Creating and Indexing the Collection

Next, create a collection using the schema you defined. Then, apply an index for faster similarity searches. The code checks whether a collection with the specified name already exists; it is dropped if it does, and a new collection is created using the defined schema. After that, an index configuration is defined for the embedding field. The IVF_FLAT index type is used for fast approximate nearest neighbor search, with L2 (Euclidean distance) as the metric type to measure similarity. The parameter nlist=128 helps configure how the vectors are partitioned for efficient search.

Inserting Data with Embeddings

The next step after creating the collection is to prepare and insert data into Milvus. The above code creates a list of multilingual data. For each text, an embedding is generated using the embedding_model.encode_documents() method, which outputs a dense vector representing the semantic meaning of the text. The insert method is then called to insert this text embedding and the corresponding text into the Milvus collection. Finally, the load_collection method loads the collection into memory, making it ready for search operations.

Querying and Searching for Similar Texts

The final step is retrieving data related to a query in Milvus. In the above code, the query settings are configured to use L2 (Euclidean distance) as the metric type, with nprobe set to 10, which determines how many partitions of the data will be scanned during the search. A query (What is Milvus?) is encoded into a vector representation using the same embedding model you used to encode your documents. The search then retrieves the top 10 most similar embeddings in the collection based on this query vector, including both the text and embedding fields in the results. If a match is found, the best result is displayed along with its text and embedding, otherwise, a message indicates that no results were found. Here are the results obtained from running the previous code:

Results of conducting a similarity search on Milvus

It is evident that the retrieved document is related to your query, as it answers the query question directly. You can also see the query embedding generated by the embedding model together with the embedding of the most relevant document.

Applications of Vector Embeddings

Vector embeddings have many uses in AI applications.

Similarity Search

One of the main uses of vector embeddings is in similarity search. By representing items as vectors, you can easily find items close to each other in the embedding space by calculating the distance between vector embeddings using similarity measures like Euclidean distance or Jaccard similarity. This is how many modern search engines work. They return results that are related to a user’s query in meaning, even if the exact words don’t match. For example, if someone searches for feline companions, the search engine might return results about cats, even if the word cat isn’t used. This works because the embeddings for feline and cat would be close to each other in the vector space. Milvus is very good at this type of similarity search. It can quickly and accurately find similar vectors even in data sets with billions of entries.

Recommendation Systems

Vector embeddings have greatly improved recommendation systems in various industries. By representing both users and items as vectors, these systems can find patterns and similarities that go beyond simple category matching. For example, in a movie recommendation system, the embedding for a user who likes science fiction movies with strong female leads might be similar to the embedding for a new movie that fits that description, even if it’s from a director or studio the user hasn’t watched before.

Computer Vision

While often associated with text, vector embeddings are also powerful for working with images:

Image search: By converting images into vector embeddings, systems can find visually similar images even when there’s little or no text description.
Face recognition: Embeddings can capture the unique features of faces, allowing efficient comparison and matching.
Object detection: Vector representations of image regions help models identify and locate objects within complex scenes.

Retrieval Augmented Generation (RAG)

RAG is a newer application of vector embeddings that improves the performance of LLMs. Here’s how it works:

A large amount of information (like articles, documents or databases) is converted into vector embeddings.
When a user asks a question, their query is also converted into a vector embedding.
The system searches for vector embeddings in its knowledge base (e.g., a vector database like Milvus) similar to the query vector.
The most relevant information is retrieved and used to help the language model generate a more accurate and contextually relevant response.

RAG reduces hallucinations (i.e., making up false information) in LLMs and improves their ability to provide up-to-date information.

Best Practices for Working with Vector Embeddings

To get the most out of vector embeddings, it’s important to follow some key best practices:

Choose the Right Model

This is the most important practice as not all embedding models are the same. The choice of model can greatly affect the quality and usefulness of your embeddings. We will breeze through this as this guide covers it in detail. Consider factors such as:

The specific task or domain you’re working in.
The size and nature of your data set.
Computational resources available.
Required embedding dimensionality.

For text-based tasks, models like BERT or GloVe might be appropriate, while visual tasks might benefit from models like CLIP (Contrastive Language-Image Pre-Training) and ViT (Vision Transformers).

Optimize Embedding Dimensionality

The number of dimensions in your vector embeddings can have a big impact on performance and computational requirements. Higher dimensionality captures more detailed relationships but uses more storage and processing time. It’s often worth experimenting with different dimensionalities to find the best balance for your specific application.

Implement Efficient Indexing and Search

As your collection of vector embeddings grows, efficient storage and retrieval become crucial. Therefore, choose a vector database that offers an index type that best suits your use case. For example, Milvus offers various index types optimized for different scenarios:

IVF_FLAT: Suitable for data sets of up to a million vectors, offering a good balance between search speed and accuracy.
IVF_SQ8: Provides data compression, reducing memory usage at a slight cost to accuracy.
HNSW: Excellent for scenarios requiring very low latency, though it uses more memory.
CAGRA: A GPU-based graph index with higher performance

Experiment with different index types and parameters to find the optimal configuration for your use case.

Conclusion

Vector embeddings play a crucial role in making AI systems more efficient and effective by enabling better data retrieval, similarity search and decision-making capabilities. Whether used in recommendation systems, computer vision or improving the reliability of LLM outputs through RAG, embeddings help transform unstructured data into actionable insights. As AI continues to evolve, mastering the use of vector embeddings and their accompanying tools, such as Milvus, will be key if you are looking to build robust and scalable AI applications.

Denis Kuria is a machine learning engineer with a bachelor's in computer science, specializing in retrieval-augmented generation (RAG) and large language models (LLMs). He enjoys writing guides to help developers and loves hiking and exploring the world.