Tracing Java: Instrumenting Your Code for Full-Stack Observability
Application observability has become critical for maintaining reliable Java applications in production. Tracing provides visibility into request flows across distributed systems, helping developers identify performance bottlenecks, debug issues, and understand system behavior. This article explores practical approaches to instrumenting Java applications for comprehensive observability.
Understanding Observability in Java
Observability encompasses three key pillars: metrics, logs, and traces. While metrics provide aggregate data and logs capture discrete events, distributed tracing tracks requests as they flow through multiple services, creating a complete picture of system interactions.
Modern Java applications often involve microservices, databases, message queues, and external APIs. Without proper instrumentation, understanding how these components interact during request processing becomes nearly impossible.
Core Tracing Concepts
Spans and Traces
A trace represents the complete journey of a request through your system. It consists of multiple spans, where each span represents a single operation or service call. Spans contain timing information, metadata, and context about the operation.
// Example span hierarchy for an e-commerce order Trace: Process Order (order-123) ├── Span: Validate Order (order-service) ├── Span: Check Inventory (inventory-service) │ └── Span: Database Query (postgres) ├── Span: Process Payment (payment-service) │ └── Span: External API Call (stripe-api) └── Span: Send Confirmation (notification-service)
Context Propagation
Tracing context must propagate between services to maintain trace continuity. This happens through HTTP headers, message properties, or thread-local storage, depending on the communication mechanism.
Manual Instrumentation
Using OpenTelemetry
OpenTelemetry provides a vendor-neutral approach to instrumentation. Here’s a basic setup:
// Initialize OpenTelemetry
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(
OtlpGrpcSpanExporter.builder()
.setEndpoint("http://localhost:4317")
.build())
.build())
.setResource(Resource.getDefault()
.merge(Resource.builder()
.put(ResourceAttributes.SERVICE_NAME, "order-service")
.put(ResourceAttributes.SERVICE_VERSION, "1.0.0")
.build()))
.build())
.build();
// Get tracer instance
Tracer tracer = openTelemetry.getTracer("order-service");
Creating Custom Spans
@Service
public class OrderService {
private final Tracer tracer;
public Order processOrder(OrderRequest request) {
Span span = tracer.spanBuilder("process-order")
.setAttribute("order.id", request.getId())
.setAttribute("customer.id", request.getCustomerId())
.startSpan();
try (Scope scope = span.makeCurrent()) {
// Add events to track progress
span.addEvent("validation-started");
validateOrder(request);
span.addEvent("validation-completed");
span.addEvent("inventory-check-started");
checkInventory(request.getItems());
span.addEvent("inventory-check-completed");
Order order = createOrder(request);
span.setStatus(StatusCode.OK);
return order;
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}
Tracing Database Operations
@Repository
public class OrderRepository {
private final Tracer tracer;
private final JdbcTemplate jdbcTemplate;
public Order findById(String orderId) {
Span span = tracer.spanBuilder("db.order.findById")
.setAttribute("db.system", "postgresql")
.setAttribute("db.operation", "SELECT")
.setAttribute("order.id", orderId)
.startSpan();
try (Scope scope = span.makeCurrent()) {
String sql = "SELECT * FROM orders WHERE id = ?";
return jdbcTemplate.queryForObject(sql,
new Object[]{orderId},
new OrderRowMapper());
} finally {
span.end();
}
}
}
Auto-Instrumentation
Auto-instrumentation eliminates the need for manual span creation by automatically instrumenting common libraries and frameworks.
Java Agent Setup
The OpenTelemetry Java agent provides zero-code instrumentation:
# Download the agent
wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
# Run your application with the agent
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.resource.attributes=service.name=order-service \
-Dotel.exporter.otlp.endpoint=http://localhost:4317 \
-jar order-service.jar
Supported Libraries
The agent automatically instruments popular libraries:
- Web Frameworks: Spring Boot, Spring MVC, JAX-RS
- HTTP Clients: Apache HttpClient, OkHttp, RestTemplate
- Databases: JDBC, JPA/Hibernate, Redis, MongoDB
- Messaging: RabbitMQ, Kafka, JMS
- Caching: Caffeine, Ehcache
Spring Boot Integration
Spring Boot applications can leverage Spring Cloud Sleuth for simplified tracing setup:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-otel</artifactId>
</dependency>
Configuration in application.yml:
spring:
sleuth:
otel:
exporter:
otlp:
endpoint: http://localhost:4317
zipkin:
base-url: http://localhost:9411
application:
name: order-service
management:
tracing:
sampling:
probability: 0.1 # Sample 10% of traces
Custom Span Annotations
@RestController
public class OrderController {
@NewSpan("order-creation")
@PostMapping("/orders")
public ResponseEntity<Order> createOrder(
@RequestBody OrderRequest request,
@SpanTag("customer.id") String customerId) {
// Method automatically traced
Order order = orderService.processOrder(request);
return ResponseEntity.ok(order);
}
}
Distributed Tracing Patterns
Microservice Communication
When services communicate via HTTP, tracing context propagates through headers:
@Service
public class PaymentClient {
private final RestTemplate restTemplate;
public PaymentResult processPayment(PaymentRequest request) {
// Context automatically propagated via HTTP headers
// like: traceparent, tracestate
return restTemplate.postForObject(
"/payments", request, PaymentResult.class);
}
}
Asynchronous Processing
For async operations, context must be manually propagated:
@Service
public class NotificationService {
@Async
public CompletableFuture<Void> sendOrderConfirmation(Order order) {
Span span = tracer.spanBuilder("send-notification")
.setAttribute("order.id", order.getId())
.startSpan();
try (Scope scope = span.makeCurrent()) {
// Send notification logic
emailService.sendConfirmation(order);
return CompletableFuture.completedFuture(null);
} finally {
span.end();
}
}
}
Observability Backends
Jaeger
Jaeger provides distributed tracing capabilities with a web UI for trace visualization:
# docker-compose.yml
version: '3'
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Web UI
- "14250:14250" # gRPC
environment:
- COLLECTOR_OTLP_ENABLED=true
Zipkin
Zipkin offers similar functionality with a different architecture:
services:
zipkin:
image: openzipkin/zipkin:latest
ports:
- "9411:9411"
Commercial Solutions
Enterprise solutions like Datadog, New Relic, and Dynatrace provide additional features:
- Advanced analytics and alerting
- APM integration
- Infrastructure correlation
- AI-powered insights
Performance Considerations
Sampling Strategies
Tracing every request in high-traffic applications impacts performance. Implement sampling:
// Probability-based sampling (10%)
TraceIdRatioBasedSampler.create(0.1)
// Custom sampling based on business logic
Sampler customSampler = Sampler.create(samplingResult -> {
if (isHighPriorityRequest(samplingResult.getAttributes())) {
return SamplingResult.create(SamplingDecision.RECORD_AND_SAMPLE);
}
return SamplingResult.create(SamplingDecision.DROP);
});
Resource Management
Configure appropriate batch sizes and export intervals:
BatchSpanProcessor.builder(exporter)
.setMaxExportBatchSize(512)
.setExportTimeout(Duration.ofSeconds(2))
.setScheduleDelay(Duration.ofSeconds(5))
.build()
Monitoring and Alerting
Key Metrics to Track
- Trace volume: Monitor trace ingestion rates
- Latency percentiles: P95, P99 response times
- Error rates: Failed spans and operations
- Service dependencies: Map service interactions
Sample Alert Rules
# High error rate alert
- alert: HighServiceErrorRate
expr: rate(traces_total{status="error"}[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected in {{ $labels.service_name }}"
# Latency degradation
- alert: HighServiceLatency
expr: histogram_quantile(0.95, rate(span_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: critical
Best Practices
Instrumentation Guidelines
- Instrument at service boundaries: HTTP endpoints, database calls, external APIs
- Use semantic conventions: Follow OpenTelemetry naming standards
- Add meaningful attributes: Include business context like user IDs, order numbers
- Handle errors properly: Record exceptions and set appropriate status codes
Span Naming
// Good - Specific and meaningful
tracer.spanBuilder("order.validate")
tracer.spanBuilder("db.order.insert")
tracer.spanBuilder("http.payment.charge")
// Avoid - Too generic or dynamic
tracer.spanBuilder("process") // Too vague
tracer.spanBuilder("order-" + orderId) // Dynamic names create cardinality issues
Context Enrichment
Span span = tracer.spanBuilder("process-order")
.setAttribute("order.id", orderId)
.setAttribute("customer.tier", customerTier)
.setAttribute("order.value", orderValue)
.setAttribute("region", region)
.startSpan();
Troubleshooting Common Issues
Missing Traces
- Verify agent attachment and configuration
- Check sampling rates
- Validate exporter endpoints
- Review service connectivity
Performance Impact
- Adjust sampling rates
- Optimize span processors
- Monitor memory usage
- Consider async exporters
Context Loss
- Ensure proper propagation in async code
- Verify header forwarding in proxies
- Check thread pool configurations
Conclusion
Implementing comprehensive tracing in Java applications requires balancing observability needs with performance constraints. Start with auto-instrumentation for quick wins, then add custom instrumentation where business context is crucial. Proper sampling and configuration ensure tracing provides valuable insights without degrading application performance.
The investment in observability pays dividends when debugging production issues, optimizing performance, and understanding system behavior. As applications grow more complex, distributed tracing becomes essential for maintaining system reliability and user experience.

