ML Inference in Java: DL4J, DJL, and TensorFlow in Production
While Python dominates the machine learning (ML) world, Java still holds a critical edge when it comes to deploying AI in production. Enterprises with mature Java ecosystems are increasingly looking for ways to integrate ML/AI models directly into their backend systems—without needing to spin up separate Python-based microservices.
In this article, we’ll explore how Java can serve as a powerful platform for ML inference using libraries like Deeplearning4j (DL4J), Deep Java Library (DJL), and TensorFlow Java. We’ll walk through real-world scenarios, compare the tooling, and show how to build scalable, low-latency inference pipelines that fit seamlessly into Java production stacks.
Why Java for ML Inference?
Let’s be real—Java isn’t the first choice for data scientists. But when it comes to deploying trained models, Java has some clear advantages:
- Performance: JVM-based services are known for their performance and low GC overhead.
- Integration: Easy embedding into existing Spring Boot, Jakarta EE, or Micronaut applications.
- Stability: Java’s static typing and long-term support make it ideal for enterprise-grade deployments.
- Portability: Cross-platform compatibility and native packaging (e.g., via GraalVM) open doors for optimized edge or embedded inference.
Popular Java ML Inference Libraries
Let’s break down the top options for ML inference in Java.
1. Deeplearning4j (DL4J)
🔗 https://deeplearning4j.konduit.ai/
DL4J is a mature, JVM-native deep learning library built for production. It supports:
- Importing Keras or TensorFlow models
- Training and inference on CPU and GPU
- Integration with Apache Spark for distributed ML
- ONNX format support
Ideal for:
Enterprise teams needing tight JVM integration, batch or streaming inference, or on-premise deployments.
🔍 Example: Loading and Running an Inference
MultiLayerNetwork model = ModelSerializer.restoreMultiLayerNetwork("model.zip");
INDArray input = Nd4j.create(new float[]{0.5f, 0.8f}, new int[]{1, 2});
INDArray output = model.output(input);
System.out.println("Predicted: " + output);
2. Deep Java Library (DJL)
Backed by AWS and Amazon AI, DJL provides a high-level API for running inference with various engines like:
- PyTorch
- TensorFlow
- MXNet
- ONNX Runtime
- PaddlePaddle
It supports both model loading from the model zoo and custom-trained models.
Ideal for:
Cloud-native apps, serverless functions, and when you want engine-agnostic inference.
🔍 Example: Loading a Model and Running Predictions
Criteria<Image, Classifications> criteria = Criteria.builder()
.setTypes(Image.class, Classifications.class)
.optModelUrls("https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/resnet/0.0.1/resnet50")
.optEngine("PyTorch")
.build();
ZooModel<Image, Classifications> model = criteria.loadModel();
Predictor<Image, Classifications> predictor = model.newPredictor();
Classifications result = predictor.predict(myImage);
Pro Tip: DJL supports ONNX export, making it flexible for integrating models trained in Python.
3. TensorFlow Java
🔗 https://www.tensorflow.org/jvm
TensorFlow Java is the official Java binding for running models built with TensorFlow. While not as high-level as DJL, it gives you access to the raw TensorFlow APIs and supports:
- TF SavedModel format
- Tensor operations
- Native performance via JNI
Ideal for:
Teams using TensorFlow in training and needing tight control over inference execution.
🔍 Example: Loading a SavedModel
try (SavedModelBundle model = SavedModelBundle.load("model", "serve")) {
Tensor<TFloat32> input = TFloat32.tensorOf(Shape.of(1, 2));
input.setFloat(0.5f, 0, 0);
input.setFloat(0.8f, 0, 1);
try (Tensor<TFloat32> result = model.session().runner()
.feed("serving_default_input", input)
.fetch("StatefulPartitionedCall")
.run().get(0).expect(TFloat32.DTYPE)) {
System.out.println(result);
}
}
TensorFlow Java now supports TensorFlow Lite models, ideal for mobile or edge deployments.
Building a Java-Based Inference Pipeline
Here’s a typical structure:
+-----------------------+
| Java Web Service | ← Spring Boot or Micronaut
+----------+------------+
|
v
+-----------------------+
| Inference Module | ← DJL, DL4J, or TF Java
+----------+------------+
|
v
+-----------------------+
| Model Artifacts |
| (.zip, .pb, .onnx etc) |
+------------------------+
Real-World Tips
- Isolate inference logic into a module for reuse and testing.
- Use model versioning (e.g., via S3 or model registry) and load dynamically.
- Profile memory and inference time—Java gives you GC control.
- For async/batch inference, combine with Reactor or Kotlin coroutines.
Use Cases
Here’s how teams are using Java inference in production:
- Fraud detection at fintech companies with low-latency requirements
- Document classification in enterprise search platforms
- Product recommendations using pre-trained embeddings
- Edge ML with GraalVM-native Java apps
Considerations for Production
| Aspect | Recommendation |
|---|---|
| Model Format | Use ONNX, SavedModel, or DL4J .zip |
| Threading | Pool model instances if not thread-safe |
| Monitoring | Expose inference latency via Prometheus metrics |
| Cold Starts | Pre-load model at service startup |
| Scalability | Use horizontal scaling with Kubernetes or ECS |
Tools to Explore
Final Thoughts
Java may not be the top language for training machine learning models, but when it comes to running inference in production, it shines. With powerful libraries like DL4J, DJL, and TensorFlow Java, you can build high-performance, production-ready inference pipelines that integrate smoothly with existing Java systems.
If your team already runs Spring Boot or JVM-based services, there’s no reason to offload inference to a separate Python service—bring the model to where your business logic lives.

