Back to Spring Boot Kafka
    Topic 1

    What is Apache Kafka?

    Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day

    Introduction to Apache Kafka

    Apache Kafka is an open-source distributed event streaming platform originally developed at LinkedIn in 2010 and later donated to the Apache Software Foundation in 2011. It is designed for high-throughput, fault-tolerant, and scalable real-time data streaming.

    The name "Kafka" was chosen because the system is optimized for writing, and the creator (Jay Kreps) liked the author Franz Kafka. Today, Kafka is used by over 80% of Fortune 500 companies for mission-critical applications.

    Distributed

    Runs as a cluster across multiple servers, data centers, or cloud regions

    Durable

    Persists streams of records safely with configurable retention

    Scalable

    Handles millions of events/second with horizontal scaling

    Kafka History & Evolution

    2010

    Developed at LinkedIn to handle their massive data pipeline needs (activity tracking, metrics, logs)

    2011

    Open-sourced and donated to Apache Software Foundation

    2014

    Confluent founded by Kafka creators to commercialize and advance Kafka

    2016

    Kafka Streams API released - enabling stream processing within Kafka

    2022+

    KRaft (Kafka Raft) removes ZooKeeper dependency, simplifying deployment

    The Three Core Capabilities

    Kafka combines three key capabilities that are usually handled by separate systems:

    1

    Publish & Subscribe (Messaging)

    Like a message queue, but with multiple subscribers. Producers publish messages to topics, multiple consumer groups can read independently.

    2

    Store (Durable Storage)

    Unlike traditional message queues, Kafka persists messages durably. Data can be retained for days, weeks, or forever. Replay messages from any point.

    3

    Process (Stream Processing)

    Process streams of data in real-time with Kafka Streams API or ksqlDB. Transform, aggregate, join, and analyze data as it flows through.

    Real-World Use Cases

    Messaging

    Replace traditional message brokers (RabbitMQ, ActiveMQ) for high-throughput, low-latency messaging between microservices.

    Activity Tracking

    Track user activity like page views, clicks, searches into topics for real-time analytics and recommendations.

    Log Aggregation

    Collect logs from multiple services into a central location for monitoring, alerting, and analysis (ELK stack integration).

    Stream Processing

    Real-time processing pipelines for data transformation, enrichment, and aggregation as data flows through.

    Event Sourcing

    Store immutable sequence of events as the source of truth. Rebuild application state by replaying events.

    Commit Log

    External commit log for distributed systems. Database change data capture (CDC) and cross-datacenter replication.

    Who Uses Kafka?

    Kafka powers mission-critical systems at the world's largest companies:

    LinkedIn

    7+ trillion messages/day

    Netflix

    Real-time streaming analytics

    Uber

    Trillions of events/day

    Airbnb

    Event-driven architecture

    Spotify

    Log aggregation & analytics

    Twitter

    Real-time data pipelines

    Goldman Sachs

    Financial transactions

    PayPal

    Fraud detection

    Kafka vs Traditional Message Queues

    Understanding how Kafka differs from traditional message queues like RabbitMQ or ActiveMQ:

    FeatureKafkaTraditional MQ
    Message RetentionConfigurable (hours/days/forever)Until consumed (deleted after)
    ThroughputMillions/secThousands/sec
    Message ReplayYes, by offsetNo
    Consumer ModelPull-based (consumer controls)Push-based (broker controls)
    OrderingPer partitionPer queue
    Multiple ConsumersMultiple groups read independentlyCompeting consumers (one gets it)
    Best ForHigh-volume event streamsTask queues, RPC

    💬 Comments & Discussion