Apache Camel KServe Inference
The Apache Camel KServe component streamlines the integration between Java applications and AI model servers that support the KServe Open Inference Protocol V2 over gRPC. With modern AI applications increasingly relying on scalable model inference, KServe on Kubernetes has become a popular choice. Using Apache Camel, developers can build routes to invoke AI models, retrieve their metadata, and check server health—all through uniform, simple APIs supported by KServe. This component has been available since Camel 4.10 and currently supports the producer role for gRPC-based requests, enabling direct remote AI inference from Camel routes. Let us delve into understanding how Java, Apache Camel, and KServe work together.
1. What is Apache Camel’s KServe?
KServe is a cloud-native model serving platform designed for flexibility, scalability, and production-grade reliability. Apache Camel’s KServe component enables Java integration code to communicate directly with KServe endpoints, invoking AI models and interacting with model servers via standardized gRPC APIs. Architecture-wise, Camel acts as the integration backbone, sending and receiving requests in a microservices/Kubernetes cluster to KServe-hosted models.
1.1 KServe Component Architecture
- Camel Route: Defines a workflow using endpoints such as
kserve:infer(for inference),kserve:model/metadata(for model info), andkserve:server/ready(for health checks). - KServe Model Server: Hosts AI models, exposes gRPC endpoints for inference, metadata, and health queries.
- Integration: Camel routes use the KServe component to send gRPC requests, receive structured responses (status, result, metadata), and handle them within business logic.
- Deployment: Primarily designed for cloud/Kubernetes, making it easy to manage, scale, and monitor AI services.
The KServe endpoint URI follows the pattern: kserve\:api?modelName=NAME\&modelVersion=VERSION, supporting options for routing, metadata, and readiness queries.
1.2 Steps to Host a Model on KServe
- Prepare your model: Export your trained model in a supported format (e.g., TensorFlow SavedModel, PyTorch TorchScript, ONNX, Scikit-learn pickle, or any LLM-based model). Place it in a storage bucket or persistent volume accessible from your Kubernetes cluster.
- Deploy KServe on Kubernetes: Make sure you have a Kubernetes cluster running and install KServe using its Helm chart or manifests. Typically, this involves enabling
kubectlaccess and applying the KServe operator YAML. - Create an InferenceService YAML: Define an
InferenceServicecustom resource (CRD) that specifies your model name, runtime (e.g.,sklearn,pytorch,tensorflow, orcustom), and the location (e.g., S3, GCS, or PVC) where your model artifacts are stored. - Apply the InferenceService: Use
kubectl apply -f inference-service.yamlto deploy the service. KServe will automatically spin up the required pods, download the model, and expose gRPC/REST endpoints. - Verify deployment: Run
kubectl get inferenceservicesto check the status. OnceREADYisTrue, your model is available. - Access the model endpoint: Use the provided URL or cluster gateway to send inference requests. KServe exposes both REST and gRPC endpoints following the Open Inference Protocol (V2).
- Test with sample input: Send a sample request (using
curl,grpcurl, or a Camel route) to verify that the model returns predictions as expected.
1.3 Creating the my-llm-model
The my-llm-model is a custom large language model (LLM) deployed on KServe. It can handle text inputs, generate responses, or perform classification tasks depending on your training. The process of creating and deploying this model involves the following steps:
- Training or fine-tuning the model: You can start from a pre-trained LLM (e.g., GPT, BLOOM, LLaMA) and fine-tune it on your domain-specific corpus. For instance, training data may include technical documentation, chat transcripts, or product FAQs.
- Exporting the model: Once trained, export the model in a supported format such as PyTorch TorchScript, ONNX, or TensorFlow SavedModel. For LLMs, TorchScript or ONNX are common choices for fast inference.
- Packaging for KServe: Place the model artifacts into a storage location accessible from Kubernetes (e.g., S3 bucket, GCS, or Persistent Volume). Ensure all necessary configuration files (tokenizers, vocab files, or config.json) are included.
- Defining an InferenceService: Create a KServe
InferenceServiceYAML that points to your model storage and specifies the runtime. For example, you can use thecustomruntime for a TorchScript or ONNX LLM:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: my-llm-model
spec:
predictor:
custom:
container:
image: 'my-llm-runtime:latest'
command:
- python
- serve_model.py
model:
storageUri: 's3://my-model-bucket/my-llm-model/'
kubectl apply -f my-llm-model.yaml. KServe will launch the necessary pods, load the model into memory, and expose gRPC/REST endpoints ready for inference.READY is True, you can send test requests using grpcurl, curl, or your Camel route to verify that the model generates correct responses for your inputs.By following these steps, my-llm-model is fully deployed on Kubernetes and ready for integration via Apache Camel using KServe’s gRPC API. This ensures that your LLM can handle real-time inference requests from Java applications with minimal overhead.
2. Java Code Example
2.1 Inference with KServe
In this example, the deployed model is a generic AI or LLM-based model hosted on KServe. It can be configured to handle text, image, or structured input data, depending on the model type. The model accepts input encoded as BYTES (for text) or FLOAT/INT tensors (for numeric data) and returns predictions, labels, or structured output according to its schema. For instance, a text-based LLM could accept a sentence such as "Explain reactive programming in Java" and return a generated response as UTF-8 text. This model demonstrates how Apache Camel can interact with KServe for real-time inference through the Open Inference Protocol (V2).
<project ...>
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>camel-kserve-example</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-spring-boot-starter</artifactId>
<version>latest__jar__version</version>
</dependency>
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-kserve-starter</artifactId>
<version>latest__jar__version</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>latest__jar__version</version>
</dependency>
</dependencies>
</project>
2.2 Code Example
The following Java example demonstrates Apache Camel integrating with a generic AI/LLM model. It shows building a ModelInferRequest, sending it to the KServe endpoint, and handling the protobuf response.
package org.example;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.Exchange;
import org.apache.camel.component.kserve.KServeConstants;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import ai.kserve.proto.inference.ModelInferRequest;
import ai.kserve.proto.inference.ModelInferRequest.InferInputTensor;
import ai.kserve.proto.inference.InferTensorContents;
import ai.kserve.proto.inference.ModelInferResponse;
import java.util.List;
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Bean
public RouteBuilder kserveRoute() {
return new RouteBuilder() {
@Override
public void configure() throws Exception {
from("timer:tick?repeatCount=1")
.setBody(constant(createRequest("Explain reactive programming in Java")))
.to("kserve:infer?modelName=my-llm-model&target=localhost:8001")
.process(this::postprocess)
.log("Inference result: ${body}");
}
ModelInferRequest createRequest(String text) {
var contents = InferTensorContents.newBuilder()
.addAllByteContents(List.of(
com.google.protobuf.ByteString.copyFromUtf8(text)
))
.build();
var input = InferInputTensor.newBuilder()
.setName("INPUT_0")
.setDatatype("BYTES")
.addShape(1)
.setContents(contents)
.build();
return ModelInferRequest.newBuilder()
.addInputs(input)
.build();
}
void postprocess(Exchange exchange) {
ModelInferResponse response = exchange.getMessage().getBody(ModelInferResponse.class);
var output = response.getOutputs(0);
var raw = output.getContents().getByteContents(0).toStringUtf8();
exchange.getMessage().setBody(raw);
}
};
}
}
2.2.1 Code Explanation
This Spring Boot application demonstrates integrating Apache Camel with KServe for inference using a generic AI/LLM model. The Camel route builds a ModelInferRequest with text input, sends it to the kserve:infer endpoint specifying the model name (my-llm-model) and gRPC server, then extracts and logs the inference response.
2.2.2 Code Output
2025-09-20 10:12:45.789 INFO org.example.Application : Started Application in 3.2 seconds 2025-09-20 10:12:46.321 INFO route1 : Inference result: Reactive programming is a paradigm that allows handling asynchronous data streams efficiently...
3. Conclusion
The KServe component in Apache Camel enables modern, scalable AI model inference directly from Java integration routes. By supporting generic AI or LLM models, this approach allows organizations to build flexible, maintainable, and cloud-native workflows. Whether deployed in Kubernetes or as part of distributed microservices, Camel’s abstraction over KServe’s gRPC APIs reduces technical overhead while maximizing efficiency for production-grade AI pipelines.




