Understanding subscribe() and assign() Methods of KafkaConsumer
Apache Kafka consumers can read messages from topics using two important methods: subscribe() and assign(). Understanding how these approaches differ in partition assignment helps developers design scalable consumer groups or implement precise, deterministic data processing when required. Let us delve into understanding the Kafka subscribe vs assign methods and how they influence partition assignment, scalability, and message consumption behavior in Kafka consumer applications.
1. Introduction
In Apache Kafka, the KafkaConsumer API provides two primary ways to consume messages from topics: subscribe() and assign(). Both methods allow a consumer to read records from Kafka topics, but they differ significantly in how partitions are assigned to the consumer. The subscribe() method relies on Kafka’s consumer group management to automatically assign partitions to consumers. On the other hand, the assign() method allows developers to manually assign specific partitions to a consumer. Understanding the difference between these two methods is important when designing Kafka-based applications, especially in scenarios involving scaling, partition management, or deterministic data processing.
1.1 Prerequisites and Kafka Setup Using Docker
Below is a simple docker-compose.yml configuration that starts both Zookeeper and Apache Kafka.
version: "3"
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
container_name: zookeeper
ports:
- 2181:2181
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:7.5.0
container_name: kafka
ports:
- 9092:9092
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Start the Kafka environment using the following command:
docker-compose up -d
Once the containers are running, create the topic used in this article.
docker exec -it kafka kafka-topics \ --create \ --topic demo-topic \ --bootstrap-server localhost:9092 \ --partitions 2 \ --replication-factor 1
You can verify that the topic was created successfully.
docker exec -it kafka kafka-topics \ --list \ --bootstrap-server localhost:9092
Finally, produce a few sample messages to the topic so the consumer examples in this article can read them.
docker exec -it kafka kafka-console-producer \ --topic demo-topic \ --bootstrap-server localhost:9092
After entering the producer console, type messages such as:
order created order shipped payment received order delivered
Once Kafka is running and the demo-topic topic contains messages, you can execute the consumer examples presented in the following sections to understand the behavior of subscribe() and assign().
2. Automatic Partition Assignment with Kafka Consumer subscribe()
The subscribe() method allows a consumer to join a consumer group. Kafka then automatically assigns partitions of the subscribed topic to consumers within that group. If a new consumer joins or leaves the group, Kafka triggers a rebalance to redistribute partitions among active consumers.
This approach is commonly used in production systems because it supports automatic scaling and fault tolerance. Developers do not need to worry about which consumer reads which partition.
// KafkaSubscribeExample.java
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.util.Collections;
import java.util.Properties;
public class KafkaSubscribeExample {
public static void main(String[] args) {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "demo-consumer-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("demo-topic"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(java.time.Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println(
"Partition: " + record.partition() +
" Offset: " + record.offset() +
" Key: " + record.key() +
" Value: " + record.value()
);
}
}
}
}
2.1 Code Explanation
The KafkaSubscribeExample program demonstrates how a Kafka consumer reads messages using automatic partition assignment through the subscribe() method. The application begins by creating a Properties object that holds the Kafka consumer configuration, where bootstrap.servers specifies the Kafka broker address (localhost:9092), group.id defines the consumer group name (demo-consumer-group), and both key.deserializer and value.deserializer are set to StringDeserializer so that message keys and values can be converted from byte data into readable strings. The configuration auto.offset.reset is set to earliest, ensuring that if no committed offset exists for the consumer group, the consumer starts reading messages from the beginning of the topic. A KafkaConsumer<String, String> instance is then created using these properties. The consumer subscribes to the topic demo-topic using consumer.subscribe(Collections.singletonList("demo-topic")), which allows Kafka to automatically assign partitions to the consumer as part of the consumer group coordination process. The program then enters an infinite loop where it continuously polls the Kafka broker using consumer.poll(Duration.ofMillis(100)) to retrieve available records. The returned ConsumerRecords collection is iterated, and for each ConsumerRecord the program prints the partition number, offset, key, and value of the message. This example illustrates how the subscribe() method simplifies consumption by delegating partition management to Kafka, enabling scalable and fault-tolerant message processing within a consumer group.
2.2 Code Output
Partition: 0 Offset: 1 Key: null Value: order created Partition: 0 Offset: 2 Key: null Value: order shipped Partition: 1 Offset: 0 Key: null Value: payment received
The output shows the messages consumed by the KafkaConsumer from the topic demo-topic after Kafka automatically assigns partitions using the subscribe() method. Each line represents a single record fetched from Kafka and displays metadata along with the message content. The Partition value indicates the specific partition of the topic from which the message was read, while the Offset represents the unique sequential position of the message within that partition. The Key field displays the message key, which in this example is null because the messages were produced without keys. The Value field shows the actual message payload such as order created, order shipped, and payment received. The output also demonstrates that messages may be consumed from multiple partitions (for example partition 0 and partition 1) depending on how Kafka assigns partitions to the consumer within the consumer group. As the program continues polling in an infinite loop, new records will keep appearing in the output whenever additional messages are produced to the topic.
3. Manual Partition Assignment Using assign()
The assign() method allows developers to manually specify which partitions a consumer should read from. Unlike subscribe(), this method does not involve consumer group coordination or automatic rebalancing. This approach is useful when:
- Applications require deterministic partition processing
- Specific partitions must be processed by dedicated consumers
- Custom offset management is required
- Applications perform debugging or replay operations
// KafkaAssignExample.java
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import java.util.Arrays;
import java.util.Properties;
public class KafkaAssignExample {
public static void main(String[] args) {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
TopicPartition partition0 = new TopicPartition("demo-topic", 0);
consumer.assign(Arrays.asList(partition0));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(java.time.Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println(
"Partition: " + record.partition() +
" Offset: " + record.offset() +
" Key: " + record.key() +
" Value: " + record.value()
);
}
}
}
}
3.1 Code Explanation
The KafkaAssignExample program demonstrates how a Kafka consumer reads messages using manual partition assignment through the assign() method instead of relying on consumer group coordination. The program starts by creating a Properties object that contains the Kafka consumer configuration, where bootstrap.servers defines the Kafka broker address (localhost:9092), and both key.deserializer and value.deserializer are set to StringDeserializer so that message keys and values can be converted from byte arrays into readable strings. The configuration auto.offset.reset is set to earliest, ensuring that if no offset is committed, the consumer begins reading from the start of the partition. A KafkaConsumer<String, String> instance is then created using these properties. Next, a TopicPartition object is created for partition 0 of the topic demo-topic, explicitly identifying the partition that the consumer should read from. The consumer is then manually assigned this partition using consumer.assign(Arrays.asList(partition0)), which bypasses the automatic partition assignment and consumer group rebalancing mechanism. The program then enters an infinite loop where it repeatedly polls Kafka using consumer.poll(Duration.ofMillis(100)) to retrieve available records from the assigned partition. The returned ConsumerRecords collection is iterated, and for each ConsumerRecord the program prints the partition number, offset, key, and value of the message. This example illustrates how assign() gives developers full control over which partitions a consumer reads from, which can be useful for specialized processing, debugging, or building custom consumption strategies.
3.2 Code Output
Partition: 0 Offset: 1 Key: null Value: order created Partition: 0 Offset: 2 Key: null Value: order shipped Partition: 0 Offset: 3 Key: null Value: order delivered
The output illustrates the records consumed by the KafkaConsumer when partitions are manually assigned using the assign() method. Each line represents a message retrieved from the specified partition of the topic demo-topic. The Partition field indicates the partition number from which the message was read, which in this case is always 0 because the consumer was explicitly assigned only partition 0 using the TopicPartition object. The Offset value represents the sequential position of the message within that partition and increases as new messages are read. The Key field is null since the producer did not assign a key to the messages. The Value field contains the actual message payload such as order created, order shipped, and order delivered. Unlike the subscribe() example where records may come from multiple partitions, this output demonstrates deterministic consumption from a single manually assigned partition, giving the application full control over which partition is processed and ensuring that all messages from that partition are consumed sequentially.
4. Conclusion
Both subscribe() and assign() methods are important mechanisms for consuming data using KafkaConsumer. The subscribe() method is best suited for scalable applications where multiple consumers operate within a consumer group, allowing Kafka to automatically distribute partitions among consumers and handle rebalancing when consumers join or leave the group. In contrast, the assign() method is useful when developers require complete control over which specific partitions a consumer reads from, as it allows manual partition assignment and bypasses consumer group coordination entirely. While subscribe() provides automatic partition rebalancing and built-in load distribution, assign() offers deterministic control over partition consumption. In most real-world streaming applications, subscribe() is preferred because of its scalability and automatic load balancing capabilities; however, assign() becomes valuable in scenarios where precise partition control, debugging, or specialized processing logic is required.




