The Necessity of Message Keys in Kafka: From Partitioning Strategies to Log Compaction

Keywords: Apache Kafka | Message Keys | Partitioning Strategy | Log Compaction | Message Ordering

Abstract: This article provides an in-depth analysis of the role and necessity of message keys in Apache Kafka. By examining partitioning strategies, message ordering guarantees, and log cleanup mechanisms, it clarifies when keys are essential and when keyless messages are appropriate. With code examples and configuration parameters, it offers practical guidance for optimizing Kafka application design.

In the Apache Kafka Producer API, whether to include a key when sending messages is a common design decision. This article delves into the technical mechanisms of message keys and explores their necessity in various scenarios.

Partitioning Strategy and Message Ordering

Kafka achieves horizontal scaling through partitioning, and message keys directly influence partition assignment. By default, Kafka uses the DefaultPartitioner, which determines the target partition based on the hash of the key:

kafka.common.utils.Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;

When the message key is null, the producer employs a round-robin approach to distribute messages evenly across partitions. While this ensures load balancing, it compromises ordering guarantees for messages with the same key.

Kafka only guarantees message order within a single partition, not across partitions. Consider a financial transaction scenario:

// Example of keyless messages
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 2, "changeInBankAccount": +100}
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 1, "changeInBankAccount": -1337}
null:{"customerId": 1, "changeInBankAccount": +200}

Since keyless messages may be assigned to different partitions, a consumer might read all messages from partition 0 first (showing a balance of +600), then read the deduction message from partition 1 (-1337), leading to incorrect balance calculations. Using customerId as the key ensures all transactions for the same customer go to the same partition, preserving processing order.

Note that even with message keys, the producer configuration max.in.flight.requests.per.connection (default 5) can affect ordering. When this value is greater than 1 and retries are enabled, there is a risk of message reordering. For strict ordering, set this parameter to 1.

Log Cleanup and the Role of Keys

Kafka offers two log retention policies: time-based deletion (log.retention.hours) and log compaction. The former deletes entire log segments based on age and does not require keys; the latter relies on keys for deduplication.

Log compaction is enabled via cleanup.policy=compacted and ensures at least the latest value for each key is retained. This is crucial for state machine applications, such as user configuration updates:

// Log compaction example
user123:{"theme": "dark"}
user456:{"theme": "light"}
user123:{"theme": "blue"}  // Only this record remains after compaction

Relevant configuration parameters include:

log.cleaner.enable: Enables the log cleaner
log.cleaner.delete.retention.ms: Controls deletion retention time

If message keys are null, log compaction cannot be used because the system cannot identify which messages belong to the same logical entity.

Practical Recommendations and Performance Considerations

Using message keys is strongly recommended in the following scenarios:

Requiring ordering guarantees for messages with the same key (e.g., event sourcing, state machines)
Enabling log compaction
Implementing partition-based consumption patterns

Consider using null keys in these cases:

Messages are entirely independent with no ordering requirements
Only time-based log retention is used
Avoiding hotspot partitions due to uneven key distribution

Developers can also implement custom Partitioner classes to override default partitioning logic:

int partition(String topic, 
              Object key,
              byte[] keyBytes,
              Object value,
              byte[] valueBytes,
              Cluster cluster) {
    // Custom partitioning logic
}

Message keys are typically smaller than message bodies, making them more efficient to parse, and can include metadata to assist downstream processing. Additionally, Kafka provides a message header mechanism for storing auxiliary information that does not affect partitioning.

Conclusion

Message keys are not mandatory in Kafka, but their design choices significantly impact application behavior. Keyless messages suit simple event streaming scenarios with good load balancing, while keyed messages provide foundational support for ordered processing, state management, and storage optimization. Developers should carefully decide whether to use keys and how to design key structures based on specific business needs, ordering requirements, and storage strategies.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.

Partitioning Strategy and Message Ordering

Log Cleanup and the Role of Keys

Practical Recommendations and Performance Considerations

Conclusion

Cite this article