Back to Spring Boot Kafka
    Topic 2

    Key Concepts of Kafka

    Understand the fundamental building blocks: Topics, Partitions, Producers, Consumers, and Brokers

    Kafka Components Overview

    Apache Kafka is built around a few core concepts that work together to provide a scalable, fault-tolerant messaging system.

    Kafka Consumer Groups Diagram

    Topics - The Heart of Kafka

    A Topic is a category or feed name to which messages are published. Topics are similar to tables in a database or folders in a filesystem. Each topic has a unique name within the Kafka cluster.

    Key Characteristics of Topics
    • Multi-subscriber: A topic can have zero, one, or many consumers that subscribe to the data written to it
    • Retention: Data in topics is retained for a configurable period (time-based or size-based)
    • Immutable: Once data is written to a topic, it cannot be changed (immutable append-only log)
    • Partitioned: Topics are split into partitions for parallelism and scalability

    Naming Convention

    Use descriptive names like order-events, user-activity, or payment-transactions. Avoid special characters; use hyphens or underscores.

    Partitions - The Unit of Parallelism

    Partitions are the unit of parallelism in Kafka. Each topic is divided into one or more partitions, and each partition is an ordered, immutable sequence of messages that is continually appended to.

    Topic: orders (3 partitions)
    Partition 0:  [msg0] → [msg3] → [msg6] → [msg9]  → ...
    ↑                         ↑
    offset=0                  offset=3
    Partition 1:  [msg1] → [msg4] → [msg7] → [msg10] → ...
    ↑                         ↑
    offset=0                  offset=3
    Partition 2:  [msg2] → [msg5] → [msg8] → [msg11] → ...
    ↑                         ↑
    offset=0                  offset=3
    Offset

    Each message within a partition has a unique sequential ID called offset. Offsets are immutable and always increasing. Consumers track their position using offsets.

    Ordering Guarantee

    Kafka guarantees message ordering only within a partition, not across partitions. Use message keys to ensure related messages go to the same partition.

    Partition Count is (Almost) Immutable

    You can increase the number of partitions, but you cannot decrease them. Also, increasing partitions may break key-based ordering. Plan your partition count carefully!

    How Many Partitions Should You Have?
    • More partitions = More parallelism: Each partition can be consumed by only one consumer in a group
    • Rule of thumb: Number of partitions ≥ Number of consumers you plan to have
    • Don't over-partition: Each partition has overhead (file handles, memory, leader election time)

    Producers - Writing Data to Kafka

    Producers are applications that publish (write) messages to Kafka topics. They are responsible for choosing which partition within a topic to send the message to.

    Message Structure
    • Key (optional): Used for partition routing
    • Value: The actual message content
    • Timestamp: When the message was produced
    • Headers (optional): Metadata key-value pairs
    Partition Assignment
    • With Key: hash(key) % numPartitions
    • Without Key: Round-robin or sticky partitioning
    • Custom: Implement custom partitioner
    Producer Example with Key
    @AutowiredprivateKafkaTemplate<String,Order> kafkaTemplate;publicvoidsendOrder(Order order){// Using orderId as key ensures all events for same order// go to the same partition (maintaining order)String key = order.getOrderId();
    kafkaTemplate.send("orders", key, order).whenComplete((result, ex)->{if(ex ==null){RecordMetadata metadata = result.getRecordMetadata();
    log.info("Sent to partition {} with offset {}", 
    metadata.partition(), metadata.offset());}});}
    Producer Acknowledgments (acks)
    acksDescriptionDurabilityLatency
    0Fire and forgetLowLowest
    1Leader acknowledgmentMediumMedium
    all / -1All in-sync replicas ackHighestHighest

    Consumers & Consumer Groups

    Consumers read messages from topics. They are organized into Consumer Groups for load balancing and fault tolerance. This is one of Kafka's most powerful features!

    Consumer Group Rules
    • Each partition is consumed by exactly one consumer within a group
    • A consumer can consume from multiple partitions
    • If consumers > partitions, some consumers will be idle
    • Multiple consumer groups can read from the same topic independently
    Topic: orders (4 partitions: P0, P1, P2, P3)
    Consumer Group A (2 consumers):
    Consumer-1 → P0, P1
    Consumer-2 → P2, P3
    Consumer Group B (4 consumers):
    Consumer-1 → P0
    Consumer-2 → P1
    Consumer-3 → P2
    Consumer-4 → P3
    Consumer Group C (6 consumers):
    Consumer-1 → P0
    Consumer-2 → P1
    Consumer-3 → P2
    Consumer-4 → P3
    Consumer-5 → IDLE (no partition)
    Consumer-6 → IDLE (no partition)
    Consumer with Group ID
    @KafkaListener(
    topics ="orders", 
    groupId ="order-processors",
    concurrency ="3"// 3 consumer threads)publicvoidconsume(@PayloadOrder order,@Header(KafkaHeaders.RECEIVED_PARTITION)int partition,@Header(KafkaHeaders.OFFSET)long offset,@Header(KafkaHeaders.RECEIVED_TIMESTAMP)long timestamp){
    log.info("Received order {} from partition {} at offset {}", 
    order.getOrderId(), partition, offset);processOrder(order);}
    Offset Management
    Auto Commit

    Kafka automatically commits offsets periodically. Simple but may cause duplicates or data loss on failures.

    Manual Commit

    Application controls when to commit. Provides exactly-once semantics with proper implementation.

    Brokers & Clusters

    A Broker is a Kafka server that stores data and serves clients. Multiple brokers form a Cluster. Each broker is identified by a unique numeric ID.

    Broker Duties
    • • Receive messages from producers
    • • Store messages on disk
    • • Serve consumer fetch requests
    • • Replicate data to followers
    Partition Leader
    • • Each partition has one leader
    • • All reads/writes go to leader
    • • Leader replicates to followers
    • • Auto-failover if leader dies
    Replication
    • • Data replicated across brokers
    • • Replication factor configurable
    • • ISR: In-Sync Replicas
    • • Ensures fault tolerance
    Kafka Cluster (3 Brokers, Replication Factor = 3)
    Topic: orders (3 Partitions)
    Broker 0:
    - Partition 0 (LEADER)
    - Partition 1 (Follower)
    - Partition 2 (Follower)
    Broker 1:
    - Partition 0 (Follower)
    - Partition 1 (LEADER)
    - Partition 2 (Follower)
    Broker 2:
    - Partition 0 (Follower)
    - Partition 1 (Follower)
    - Partition 2 (LEADER)
    If Broker 1 goes down:
    → Partition 1 leadership moves to Broker 0 or Broker 2

    Bootstrap Servers

    Clients only need to connect to one broker initially (bootstrap server). That broker provides metadata about all brokers in the cluster.

    ZooKeeper and KRaft (Kafka Raft)

    Kafka requires a coordination service to manage cluster metadata. Historically this was Apache ZooKeeper, but Kafka 3.x introduces KRaft (Kafka Raft) as a built-in replacement.

    ZooKeeper (Legacy)
    • • Stores cluster metadata
    • • Tracks broker health
    • • Elects partition leaders
    • • Requires separate deployment
    • • Being phased out in Kafka 4.0
    KRaft (Modern)
    • • Built into Kafka itself
    • • Simpler deployment
    • • Faster failover
    • • Better scalability
    • • Default in Kafka 3.3+

    💬 Comments & Discussion