Apache Kafka for Edge Computing: Real-World Use Cases in 2025

Eleftheria DrosopoulouOctober 24th, 2025Last Updated: October 17th, 2025

0 348 7 minutes read

Edge computing moved from buzzword to necessity. Devices generate data faster than networks can transmit it to the cloud. Apache Kafka, traditionally a data center technology, has evolved to handle edge deployments. Here’s how it’s being used in 2025 and whether it makes sense for your edge architecture.

Why Kafka at the Edge?

Edge environments have unique constraints: limited resources, unreliable connectivity, and the need for local processing. Kafka wasn’t designed for this, but its core strengths translate well:

Durability: Edge devices lose connection. Kafka’s log-based storage buffers data until connectivity returns.

Stream processing: Real-time analytics at the edge reduce latency and bandwidth costs. Process locally, send only insights to the cloud.

Decoupling: Edge applications stay loosely coupled. Sensors, processors, and actuators communicate through Kafka without tight integration.

The challenge is running Kafka in resource-constrained environments. That’s where lightweight Kafka distributions and edge-optimized configurations come in.

The Edge Kafka Stack in 2025

Traditional Kafka is too heavy for most edge devices. The 2025 edge stack looks different:

Lightweight brokers: Projects like KRaft (Kafka without ZooKeeper) reduced operational overhead. A single Kafka broker can run on 2GB RAM now.

Edge-optimized configurations: Smaller log segments, aggressive compression, reduced replica counts. You’re trading some fault tolerance for resource efficiency.

Kafka Connect at the edge: Pre-built connectors for IoT protocols (MQTT, OPC-UA, Modbus). Data from industrial sensors flows into Kafka without custom code.

ksqlDB for local processing: SQL-based stream processing at the edge. No need to write Java applications for basic transformations.

-- ksqlDB running at the edge
CREATE STREAM sensor_readings (
    sensor_id VARCHAR,
    temperature DOUBLE,
    timestamp BIGINT
) WITH (
    kafka_topic='raw_sensors',
    value_format='json'
);

-- Alert on anomalies locally
CREATE STREAM high_temp_alerts AS
    SELECT sensor_id, temperature, timestamp
    FROM sensor_readings
    WHERE temperature > 85.0;

This query runs continuously at the edge, filtering data before cloud transmission.

Use Case 1: Smart Manufacturing

Modern factories have thousands of sensors. Sending all that data to the cloud is expensive and slow. Processing locally is essential.

The setup: Kafka brokers run on edge servers in the factory. Machines publish sensor data (temperature, vibration, output rates) to local topics.

Local processing:

Detect anomalies in real-time (bearing failure signatures, temperature spikes)
Calculate equipment efficiency metrics
Trigger automated responses (shut down overheating equipment)

Cloud synchronization: Only aggregate metrics and alerts sync to the cloud. Raw sensor data stays local unless needed for analysis.

# Producer at the edge - collecting sensor data
from confluent_kafka import Producer

def publish_sensor_data(sensor_id, reading):
    producer = Producer({'bootstrap.servers': 'edge-kafka:9092'})
    
    message = {
        'sensor_id': sensor_id,
        'temperature': reading['temp'],
        'vibration': reading['vibration'],
        'timestamp': int(time.time() * 1000)
    }
    
    producer.produce(
        'factory_sensors',
        key=sensor_id,
        value=json.dumps(message)
    )
    producer.flush()

Why it works: Latency drops from seconds to milliseconds. Bandwidth costs decrease by 90%. Factories maintain operations even when cloud connectivity fails.

Use Case 2: Autonomous Vehicles

Self-driving vehicles are extreme edge cases. Multiple sensors (cameras, lidar, radar) generate gigabytes per second. Cloud processing isn’t viable.

The architecture: Each vehicle runs a lightweight Kafka instance. Sensors publish to topics organized by type (camera_front, lidar_360, radar_data).

Processing pipeline:

Sensor fusion happens locally via Kafka Streams
Object detection runs on processed streams
Decision-making consumes fused data
Only decisions and select telemetry sync to the cloud

// Kafka Streams for sensor fusion at the edge
StreamsBuilder builder = new StreamsBuilder();

KStream<String, CameraData> cameraStream = 
    builder.stream("camera_front");
KStream<String, LidarData> lidarStream = 
    builder.stream("lidar_360");

// Join streams within 100ms window
KStream<String, FusedSensor> fused = cameraStream
    .join(
        lidarStream,
        (camera, lidar) -> new FusedSensor(camera, lidar),
        JoinWindows.ofTimeDifferenceWithNoGrace(
            Duration.ofMillis(100)
        )
    );

fused.to("fused_sensors");

Why it works: Real-time processing requirements (sub-20ms) demand local computation. Kafka’s exactly-once semantics ensure reliable sensor fusion.

Use Case 3: Retail Edge Analytics

Stores use cameras and sensors for foot traffic analysis, inventory monitoring, and theft prevention. Privacy regulations often prohibit sending raw video to the cloud.

The implementation: Edge servers in each store run Kafka. Cameras and sensors publish events locally.

Processing at the edge:

Computer vision extracts insights from video (people counting, heat maps)
Only anonymized metrics leave the store
Local alerts for security events
Inventory tracking based on shelf sensors

Multi-store aggregation: Regional edge nodes aggregate data from multiple stores. Only summaries reach the cloud data center.

# Consumer processing video at the edge
from confluent_kafka import Consumer

consumer = Consumer({
    'bootstrap.servers': 'store-edge-kafka:9092',
    'group.id': 'vision-processor',
    'auto.offset.reset': 'latest'
})

consumer.subscribe(['camera_feeds'])

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    
    # Process video frame
    frame = decode_frame(msg.value())
    people_count = count_people(frame)
    
    # Publish anonymized metric
    publish_metric('people_count', people_count)
    
    # Don't send raw video to cloud

Why it works: Privacy compliance is built in. Bandwidth costs are minimal. Store operations continue during internet outages.

Use Case 4: Oil and Gas Remote Monitoring

Drilling sites and pipelines are remote, connectivity is intermittent, and equipment failures are expensive. Edge Kafka enables reliable monitoring.

The setup: Ruggedized edge servers at drilling sites run Kafka. Sensors monitor pressure, temperature, flow rates, and equipment vibration.

Local intelligence:

Detect dangerous conditions (pressure spikes, gas leaks)
Predict equipment failures before they happen
Trigger automated shutdowns when necessary

Syncing with headquarters: When satellite connectivity is available, buffered data syncs to cloud Kafka clusters. Gap-free data despite intermittent connection.

// Kafka producer with retry for unreliable networks
Properties props = new Properties();
props.put("bootstrap.servers", "edge-kafka:9092");
props.put("acks", "all");
props.put("retries", Integer.MAX_VALUE);
props.put("max.in.flight.requests.per.connection", 1);
props.put("enable.idempotence", true);

Producer<String, String> producer = new KafkaProducer<>(props);

// Data persists locally until it can be sent
producer.send(new ProducerRecord<>(
    "pipeline_sensors",
    sensorId,
    sensorData
));

Why it works: Kafka’s durability guarantees no data loss during connectivity gaps. Local processing enables safety-critical real-time decisions.

Use Case 5: Healthcare IoT at the Edge

Hospitals use IoT devices for patient monitoring. Latency matters (cardiac events need immediate response), and patient data must stay secure.

The architecture: Each hospital floor or unit has edge Kafka infrastructure. Medical devices publish vitals to local topics.

Edge processing:

Real-time anomaly detection for critical vitals
Immediate alerts to nursing stations
Local data aggregation for monitoring dashboards
Encrypted storage compliant with HIPAA

Controlled cloud sync: Only necessary data syncs to hospital data centers. Patient identifiers are tokenized before cloud transmission.

Why it works: Sub-second alerting is possible. Patient data stays within the hospital network. Network failures don’t impact monitoring.

Configuration for Edge Deployments

Edge Kafka configurations differ from data center setups. Here’s what works in 2025:

# Minimal resource configuration
num.network.threads=2
num.io.threads=4
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400

# Aggressive cleanup for limited storage
log.retention.hours=24
log.segment.bytes=536870912
log.cleanup.policy=delete

# Compression to reduce storage and network
compression.type=zstd
compression.level=9

# Single broker - no replication at edge
default.replication.factor=1
min.insync.replicas=1

# KRaft mode - no ZooKeeper
process.roles=broker,controller
controller.quorum.voters=1@localhost:9093

This runs on 2-4GB RAM and handles moderate throughput for edge scenarios.

Edge-to-Cloud Synchronization Patterns

Getting data from edge to cloud reliably is critical. Common patterns in 2025:

MirrorMaker 2: Replicates topics from edge Kafka to cloud Kafka. Handles network interruptions gracefully.

# MirrorMaker 2 configuration for edge-to-cloud
clusters = edge, cloud
edge.bootstrap.servers = edge-kafka:9092
cloud.bootstrap.servers = cloud-kafka:9092

edge->cloud.enabled = true
edge->cloud.topics = sensors.*, alerts.*

# Sync offsets for exactly-once semantics
sync.topic.acls.enabled = false
offset-syncs.topic.replication.factor = 3

Kafka Connect: Connectors specifically designed for edge-to-cloud sync. Built-in retry logic and buffering.

Custom aggregation: Edge applications aggregate data before transmission. Raw sensor readings become hourly summaries, reducing bandwidth 100x.

Challenges and Solutions

Limited storage: Edge devices fill up quickly. Solution: Aggressive log retention policies and tiered storage (local SSD for hot data, S3 for cold data).

Resource constraints: Full Kafka is too heavy. Solution: Use KRaft mode, reduce thread counts, and consider alternatives like MQTT for the smallest devices.

Network unreliability: Edge locations have spotty connectivity. Solution: Kafka’s durability plus MirrorMaker’s retry logic handle this well.

Security at the edge: Physical security is weaker. Solution: Encryption at rest and in transit, certificate-based authentication, and minimal data retention.

Management overhead: Many edge locations are hard to manage. Solution: Centralized management tools (Confluent Control Center, Kubernetes operators) for fleet management.

When Kafka Isn’t the Answer

Be honest about fit. Kafka at the edge works for specific scenarios:

Don’t use Kafka when:

Devices have < 1GB RAM (use MQTT brokers instead)
You don’t need stream processing (simple pub-sub suffices)
Data volume is tiny (network overhead isn’t worth it)
You need sub-millisecond latency (Kafka adds overhead)

Do use Kafka when:

You need reliable buffering during network outages
Stream processing at the edge is valuable
You’re already using Kafka in the cloud (consistent architecture)
Data volumes justify the resource investment

The Economics

Infrastructure costs: Edge Kafka requires compute resources. A minimal setup costs $50-200/month per location for hardware/hosting.

Bandwidth savings: Processing locally can reduce bandwidth costs 80-95%. For high-volume scenarios, this pays for infrastructure quickly.

Latency benefits: Local processing reduces latency from seconds to milliseconds. Hard to quantify but valuable for real-time use cases.

Operational complexity: Managing distributed Kafka clusters isn’t free. Factor in monitoring, updates, and troubleshooting costs.

Break-even analysis: If you’re processing > 10GB/day per location with latency requirements < 1 second, edge Kafka typically makes economic sense.

Tools and Ecosystem in 2025

Confluent Platform for Edge: Commercial Kafka distribution optimized for edge deployments. Includes management tools and support.

Strimzi: Kubernetes operator for Kafka. Makes edge deployments on K3s (lightweight Kubernetes) straightforward.

KsqlDB: SQL-based stream processing without writing code. Perfect for edge scenarios where development resources are limited.

Kafka Connect: Pre-built connectors for industrial protocols. Integration with PLCs, sensors, and legacy systems.

Telegraf: Collects metrics from edge infrastructure and publishes to Kafka for monitoring.

Getting Started

Start small. Pick one edge location and one use case:

Deploy minimal Kafka: Single broker with KRaft mode on an edge server
Connect one data source: Use Kafka Connect for a sensor or device
Add basic processing: Simple ksqlDB query for filtering or aggregation
Sync to cloud: Configure MirrorMaker 2 for cloud replication
Monitor and optimize: Watch resource usage, tune configurations

Once you prove value at one location, expand systematically. Don’t try to deploy to 100 locations on day one.

The Reality in 2025

Edge Kafka isn’t mainstream yet, but it’s growing. Industries with high data volumes, latency requirements, and unreliable connectivity are adopting it successfully.

The technology works. The challenge is operational—managing distributed Kafka clusters requires expertise. Organizations that invest in automation and centralized management see better outcomes than those treating each edge location as a snowflake.

If your edge use case fits Kafka’s strengths, it’s a powerful tool. If not, simpler alternatives like MQTT or direct cloud connections might serve you better. The key is matching technology to requirements, not following trends.

Useful Resources

Official Documentation

Edge Kafka Tools

Stream Processing

Edge Computing Resources

Case Studies

Communities

Apache Kafka for Edge Computing: Real-World Use Cases in 2025

Why Kafka at the Edge?

The Edge Kafka Stack in 2025

Use Case 1: Smart Manufacturing

Use Case 2: Autonomous Vehicles

Use Case 3: Retail Edge Analytics

Use Case 4: Oil and Gas Remote Monitoring

Use Case 5: Healthcare IoT at the Edge

Configuration for Edge Deployments

Edge-to-Cloud Synchronization Patterns

Challenges and Solutions

When Kafka Isn’t the Answer

The Economics

Tools and Ecosystem in 2025

Getting Started

The Reality in 2025

Useful Resources

Thank you!

Eleftheria Drosopoulou

Thank you!

Why Kafka at the Edge?

The Edge Kafka Stack in 2025

Use Case 1: Smart Manufacturing

Use Case 2: Autonomous Vehicles

Use Case 3: Retail Edge Analytics

Use Case 4: Oil and Gas Remote Monitoring

Use Case 5: Healthcare IoT at the Edge

Configuration for Edge Deployments

Edge-to-Cloud Synchronization Patterns

Challenges and Solutions

When Kafka Isn’t the Answer

The Economics

Tools and Ecosystem in 2025

Getting Started

The Reality in 2025

Useful Resources

Thank you!

Related Articles

Thank you!