Enterprise Java

Apache Camel KServe Inference

The Apache Camel KServe component streamlines the integration between Java applications and AI model servers that support the KServe Open Inference Protocol V2 over gRPC. With modern AI applications increasingly relying on scalable model inference, KServe on Kubernetes has become a popular choice. Using Apache Camel, developers can build routes to invoke AI models, retrieve their metadata, and check server health—all through uniform, simple APIs supported by KServe. This component has been available since Camel 4.10 and currently supports the producer role for gRPC-based requests, enabling direct remote AI inference from Camel routes. Let us delve into understanding how Java, Apache Camel, and KServe work together.

1. What is Apache Camel’s KServe?

KServe is a cloud-native model serving platform designed for flexibility, scalability, and production-grade reliability. Apache Camel’s KServe component enables Java integration code to communicate directly with KServe endpoints, invoking AI models and interacting with model servers via standardized gRPC APIs. Architecture-wise, Camel acts as the integration backbone, sending and receiving requests in a microservices/Kubernetes cluster to KServe-hosted models.

1.1 KServe Component Architecture

  • Camel Route: Defines a workflow using endpoints such as kserve:infer (for inference), kserve:model/metadata (for model info), and kserve:server/ready (for health checks).
  • KServe Model Server: Hosts AI models, exposes gRPC endpoints for inference, metadata, and health queries.
  • Integration: Camel routes use the KServe component to send gRPC requests, receive structured responses (status, result, metadata), and handle them within business logic.
  • Deployment: Primarily designed for cloud/Kubernetes, making it easy to manage, scale, and monitor AI services.

The KServe endpoint URI follows the pattern: kserve\:api?modelName=NAME\&modelVersion=VERSION, supporting options for routing, metadata, and readiness queries.

1.2 Steps to Host a Model on KServe

  • Prepare your model: Export your trained model in a supported format (e.g., TensorFlow SavedModel, PyTorch TorchScript, ONNX, Scikit-learn pickle, or any LLM-based model). Place it in a storage bucket or persistent volume accessible from your Kubernetes cluster.
  • Deploy KServe on Kubernetes: Make sure you have a Kubernetes cluster running and install KServe using its Helm chart or manifests. Typically, this involves enabling kubectl access and applying the KServe operator YAML.
  • Create an InferenceService YAML: Define an InferenceService custom resource (CRD) that specifies your model name, runtime (e.g., sklearn, pytorch, tensorflow, or custom), and the location (e.g., S3, GCS, or PVC) where your model artifacts are stored.
  • Apply the InferenceService: Use kubectl apply -f inference-service.yaml to deploy the service. KServe will automatically spin up the required pods, download the model, and expose gRPC/REST endpoints.
  • Verify deployment: Run kubectl get inferenceservices to check the status. Once READY is True, your model is available.
  • Access the model endpoint: Use the provided URL or cluster gateway to send inference requests. KServe exposes both REST and gRPC endpoints following the Open Inference Protocol (V2).
  • Test with sample input: Send a sample request (using curl, grpcurl, or a Camel route) to verify that the model returns predictions as expected.

1.3 Creating the my-llm-model

The my-llm-model is a custom large language model (LLM) deployed on KServe. It can handle text inputs, generate responses, or perform classification tasks depending on your training. The process of creating and deploying this model involves the following steps:

  • Training or fine-tuning the model: You can start from a pre-trained LLM (e.g., GPT, BLOOM, LLaMA) and fine-tune it on your domain-specific corpus. For instance, training data may include technical documentation, chat transcripts, or product FAQs.
  • Exporting the model: Once trained, export the model in a supported format such as PyTorch TorchScript, ONNX, or TensorFlow SavedModel. For LLMs, TorchScript or ONNX are common choices for fast inference.
  • Packaging for KServe: Place the model artifacts into a storage location accessible from Kubernetes (e.g., S3 bucket, GCS, or Persistent Volume). Ensure all necessary configuration files (tokenizers, vocab files, or config.json) are included.
  • Defining an InferenceService: Create a KServe InferenceService YAML that points to your model storage and specifies the runtime. For example, you can use the custom runtime for a TorchScript or ONNX LLM:
  • apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
      name: my-llm-model
    spec:
      predictor:
        custom:
          container:
            image: 'my-llm-runtime:latest'
            command:
              - python
              - serve_model.py
          model:
            storageUri: 's3://my-model-bucket/my-llm-model/'
    
  • Deploying the model: Apply the InferenceService with kubectl apply -f my-llm-model.yaml. KServe will launch the necessary pods, load the model into memory, and expose gRPC/REST endpoints ready for inference.
  • Testing and verification: Once READY is True, you can send test requests using grpcurl, curl, or your Camel route to verify that the model generates correct responses for your inputs.

By following these steps, my-llm-model is fully deployed on Kubernetes and ready for integration via Apache Camel using KServe’s gRPC API. This ensures that your LLM can handle real-time inference requests from Java applications with minimal overhead.

2. Java Code Example

2.1 Inference with KServe

In this example, the deployed model is a generic AI or LLM-based model hosted on KServe. It can be configured to handle text, image, or structured input data, depending on the model type. The model accepts input encoded as BYTES (for text) or FLOAT/INT tensors (for numeric data) and returns predictions, labels, or structured output according to its schema. For instance, a text-based LLM could accept a sentence such as "Explain reactive programming in Java" and return a generated response as UTF-8 text. This model demonstrates how Apache Camel can interact with KServe for real-time inference through the Open Inference Protocol (V2).

<project ...>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>camel-kserve-example</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <dependencies>
    <dependency>
      <groupId>org.apache.camel.springboot</groupId>
      <artifactId>camel-spring-boot-starter</artifactId>
      <version>latest__jar__version</version>
    </dependency>

    <dependency>
      <groupId>org.apache.camel.springboot</groupId>
      <artifactId>camel-kserve-starter</artifactId>
      <version>latest__jar__version</version>
    </dependency>

    <dependency>
      <groupId>com.google.protobuf</groupId>
      <artifactId>protobuf-java</artifactId>
      <version>latest__jar__version</version>
    </dependency>
  </dependencies>
</project>

2.2 Code Example

The following Java example demonstrates Apache Camel integrating with a generic AI/LLM model. It shows building a ModelInferRequest, sending it to the KServe endpoint, and handling the protobuf response.

package org.example;

import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.Exchange;
import org.apache.camel.component.kserve.KServeConstants;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

import ai.kserve.proto.inference.ModelInferRequest;
import ai.kserve.proto.inference.ModelInferRequest.InferInputTensor;
import ai.kserve.proto.inference.InferTensorContents;
import ai.kserve.proto.inference.ModelInferResponse;

import java.util.List;

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    public RouteBuilder kserveRoute() {
        return new RouteBuilder() {
            @Override
            public void configure() throws Exception {
                from("timer:tick?repeatCount=1")
                    .setBody(constant(createRequest("Explain reactive programming in Java")))
                    .to("kserve:infer?modelName=my-llm-model&target=localhost:8001")
                    .process(this::postprocess)
                    .log("Inference result: ${body}");
            }

            ModelInferRequest createRequest(String text) {
                var contents = InferTensorContents.newBuilder()
                    .addAllByteContents(List.of(
                        com.google.protobuf.ByteString.copyFromUtf8(text)
                    ))
                    .build();

                var input = InferInputTensor.newBuilder()
                    .setName("INPUT_0")
                    .setDatatype("BYTES")
                    .addShape(1)
                    .setContents(contents)
                    .build();

                return ModelInferRequest.newBuilder()
                    .addInputs(input)
                    .build();
            }

            void postprocess(Exchange exchange) {
                ModelInferResponse response = exchange.getMessage().getBody(ModelInferResponse.class);
                var output = response.getOutputs(0);
                var raw = output.getContents().getByteContents(0).toStringUtf8();
                exchange.getMessage().setBody(raw);
            }
        };
    }
}

2.2.1 Code Explanation

This Spring Boot application demonstrates integrating Apache Camel with KServe for inference using a generic AI/LLM model. The Camel route builds a ModelInferRequest with text input, sends it to the kserve:infer endpoint specifying the model name (my-llm-model) and gRPC server, then extracts and logs the inference response.

2.2.2 Code Output

2025-09-20 10:12:45.789  INFO  org.example.Application : Started Application in 3.2 seconds
2025-09-20 10:12:46.321  INFO  route1                : Inference result: Reactive programming is a paradigm that allows handling asynchronous data streams efficiently...

3. Conclusion

The KServe component in Apache Camel enables modern, scalable AI model inference directly from Java integration routes. By supporting generic AI or LLM models, this approach allows organizations to build flexible, maintainable, and cloud-native workflows. Whether deployed in Kubernetes or as part of distributed microservices, Camel’s abstraction over KServe’s gRPC APIs reduces technical overhead while maximizing efficiency for production-grade AI pipelines.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button