What is Kafka? Please explain Kafka's core concepts and main features - 面试题

Core Concepts

Apache Kafka is a distributed streaming platform originally developed by LinkedIn and later contributed to the Apache Software Foundation. It is primarily used for building real-time data pipelines and streaming applications.

Key Features

High Throughput: Kafka can handle millions of messages per second
Low Latency: Message transmission latency is typically at the millisecond level
Scalability: Easily scale the cluster by adding Brokers
Persistence: Messages are persisted to disk, supporting data replay
Fault Tolerance: Data loss is prevented through replica mechanisms

Core Components

Producer: Message producer responsible for sending messages to the Kafka cluster
Broker: Kafka server node responsible for storing and forwarding messages
Topic: Message topic, the unit of message classification
Partition: Topic partition that improves concurrent processing capability
Consumer: Message consumer that reads messages from Topics
Consumer Group: Consumer group that implements message load balancing

Working Principle

Kafka uses a publish-subscribe model where Producers send messages to specific Topics, and Consumers subscribe to and consume messages from Topics. Each Topic can be divided into multiple Partitions distributed across different Brokers, enabling parallel processing.

Use Cases

Log collection systems
Real-time data analytics
Stream processing
Message queuing
Event sourcing

Kafka's design makes it an ideal choice for processing large-scale real-time data streams, widely used in internet, finance, IoT, and other fields.