Oracle AI Vector Search in Java (langchain4j 🦜)
Inspired from medium post from Tim J.
This post will explain how to quickly integrate the Oracle database 23ai and its AI Vector Search capabilities into a Java application, enabling semantic search in your AI app.
To start quickly, we’ll leverage Langchain4j project.
Preparation:
Step 1: Start the DB
Run the gvenzl/23-slim Docker image. You can use the following command:
docker run -p 1521:1521 -e ORACLE_PASSWORD=free -e APP_USER=developer -e APP_USER_PASSWORD=free gvenzl/oracle-free:23-slimStep 2: Install Dependencies
The following Maven dependencies will be required. We’ll use Langchain4j ONNX model to generate the vector embeddings locally (no need for any API key).
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- ONNX embedding model -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- Oracle database 23ai integration -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-oracle</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- Data source -->
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
<artifactId>ucp</artifactId>
<version>${jdbc.version}</version>
</dependency>
<!-- logging -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
</dependencies>3. Initialize the Vector Store and the Embedding Model
For this tutorial, we’ll need an EmbeddingStore and an EmbeddingModel. These are 2 critical objects provided by the Langchain4j library using builders.
The EmbeddingStore will use the Oracle database 23ai we provisioned in the first step. The EmbeddingModel will use the “all-minilm-l6-v2-q” ONNX model.
First, we configure the JDBC URL, and we provide a table name to store the original text data, the computed vector embedding, and, optionally, some metadata.
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.onnx.allminilml6v2q.AllMiniLmL6V2QuantizedEmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.oracle.CreateOption;
import dev.langchain4j.store.embedding.oracle.OracleEmbeddingStore;
import oracle.ucp.jdbc.PoolDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import java.sql.SQLException;
public class Oracle {
/**
* Data source for Oracle database 23ai.
*/
public static final PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
static {
try {
// Data source configuration
dataSource.setConnectionFactoryClassName("oracle.jdbc.datasource.impl.OracleDataSource");
// change the URL accordingly
// format: jdbc:oracle:thin:<user>/<user password>@//<hostname>/<database service>
dataSource.setURL("jdbc:oracle:thin:developer/free@//localhost/freepdb1");
}
catch (SQLException sqle) {
throw new RuntimeException(sqle);
}
}
public static final EmbeddingStore<TextSegment> embeddingStore =
OracleEmbeddingStore.builder()
.dataSource(dataSource)
.embeddingTable(
"my_embeddings",
CreateOption.CREATE_IF_NOT_EXISTS)
.build();
public static final EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel();
}4. Add documents
Use the code below to add documents with their embedding to the vector store. Metadata are optional, but they can provide interesting context (e.g., type of document).
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import static org.example.Oracle.embeddingModel;
import static org.example.Oracle.embeddingStore;
public class OracleInserter {
/**
* Add text.
*/
public static void addDocuments(final String text) {
final TextSegment segment1 = TextSegment.from(text, new Metadata());
final Embedding embedding1 = embeddingModel.embed(segment1).content();
embeddingStore.add(embedding1, segment1);
}
/**
* Add text with metadata.
*/
public static void addDocuments(String text, Metadata metadata) {
final TextSegment segment1 = TextSegment.from(text, metadata);
final Embedding embedding1 = embeddingModel.embed(segment1).content();
embeddingStore.add(embedding1, segment1);
}
}5. Search Request
Following is the implementation of a quick search for document similarity using our ONNX model:
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import java.util.List;
import static org.example.Oracle.embeddingModel;
import static org.example.Oracle.embeddingStore;
public class OracleSearcher {
public static List<EmbeddingMatch<TextSegment>> search(String query, int maxResults) {
final EmbeddingSearchRequest embeddingSearchRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(embeddingModel.embed(query).content())
.maxResults(maxResults)
.minScore(0.0)
.build();
return embeddingStore.search(embeddingSearchRequest).matches();
}
}6. Document similarity
The previous code can now be invoked to interact with the Oracle database 23ai:
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import java.util.List;
public class Main {
public static void main(String[] args) {
OracleInserter.addDocuments("I like football.");
OracleInserter.addDocuments("The weather is good today.");
List<EmbeddingMatch<TextSegment>> search = OracleSearcher.search("What is your favorite sport?", 1);
// Prints:
// Score: 0,807478
// Result: I like football.
System.out.printf("Score: %f\nResult: %s\n", search.getFirst().score(), search.getFirst().embedded().text());
}
}GitHub repository: https://github.com/loiclefevre/ai_vector_search
Langchain4j: https://github.com/langchain4j/langchain4j
Oracle Database 23ai AI Vector Search Book: https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/
