Understanding Kafka Architecture with Zookeeper: A Comprehensive Guide
Table of contents
Introduction:- Hey everyone hello my name is Gaurav and today we are going to see deep dive into Apache Kafka which is one of the famous massage streaming platforms, In today's data-driven world, managing and processing large volumes of data in real-time is essential for building scalable and fault-tolerant systems. Apache Kafka, a distributed streaming platform, has emerged as a popular choice for handling high-throughput data streams. At the core of Kafka's architecture lies ZooKeeper, a distributed coordination service that helps manage and maintain the Kafka cluster. In this blog, we will delve into the fundamentals of Kafka architecture, explore the role of ZooKeeper, and provide a practical example of setting up a single-node Kafka cluster on both Windows and macOS operating systems.
Kafka Architecture Overview:-
Kafka's distributed architecture is designed to handle high-throughput, fault-tolerant data streams. Let's take a closer look at the key components of Kafka:-
1. Producers: Producers are entities responsible for publishing data to Kafka topics. They can be applications, devices, or any source system generating data. Producers write messages to Kafka topics, which act as message queues where data is organized and stored.
2. Topics: Topics are logical categories or feeds to which producers publish messages. Each topic consists of one or more partitions that allow for parallel processing. Kafka retains the messages in topics for a configurable period, enabling both real-time and batch consumption.
3. Brokers: Kafka brokers form the core of the Kafka cluster. Brokers receive, store, and serve the published messages. They collaborate to form a distributed system that provides high availability and fault tolerance. Each broker in the cluster is assigned a unique identifier and is responsible for one or more partitions.
4. Consumers: Consumers are applications or systems that subscribe to specific topics to consume the messages published by producers. They can process the messages in real time or store them for later analysis. Consumers read messages from specific partitions, allowing for parallel consumption.
5. ZooKeeper: ZooKeeper is a distributed coordination service that Kafka relies upon. It provides a centralized repository for maintaining metadata, coordinating cluster activities, and detecting changes in the Kafka cluster. ZooKeeper acts as a distributed configuration store and plays a crucial role in leader election and cluster management
.
The Role of ZooKeeper in Kafka:-
ZooKeeper plays a vital role in ensuring the reliability and stability of the Kafka cluster. Let's explore the specific tasks performed by ZooKeeper:-
1. Cluster Coordination: ZooKeeper maintains a hierarchical namespace known as a Z-node, which stores metadata about Kafka brokers, topics, partitions, and consumer offsets. This information enables coordination and synchronization among different components of the Kafka cluster.
2. Leader Election: ZooKeeper facilitates leader election for each partition in a Kafka topic. The leader is responsible for handling all read and write requests for that partition, while the followers replicate the data. ZooKeeper ensures that only one broker acts as the leader for a partition at any given time.
3. Configuration Management: Kafka's configuration parameters, such as the number of partitions, replication factor, and retention policies, are stored and managed by ZooKeeper. ZooKeeper allows dynamic updates to the cluster's configuration, and changes are detected and communicated to the Kafka brokers.
4. Health Monitoring: ZooKeeper continuously monitors the health of Kafka brokers by periodically checking their presence and connectivity. If a broker goes offline or becomes available again, ZooKeeper notifies the cluster, triggering appropriate actions such as leader reassignment or rebalancing of partitions.
Setting up a Single-Node Kafka Cluster on Windows: -
To gain a practical understanding of Kafka's architecture with ZooKeeper, let's walk through the steps of setting up a single-node Kafka cluster on a Windows operating system:-
Step 1: Install Kafka and ZooKeeper
Download the latest version of Apache Kafka from the official website (
https://kafka.apache.org/downloads
) and extract the files to a desired location.
Download Apache ZooKeeper from the official website (
https://zookeeper.apache.org/releases.html
) and extract the files to a separate directory.
Step 2: Start ZooKeeper
{1} Open a command prompt and navigate to the ZooKeeper directory.
{2} Rename the "zoo_sample.cfg" file to "zoo. cfg".
{3} Open the "zoo. cfg" file in a text editor and update the "dataDir" property to specify the path where ZooKeeper will store its data
.
Run the following command to start ZooKeeper:-
.\bin\zkServer.cmd
Step 3: Start Kafka Broker
{1} Open a new command prompt and navigate to the Kafka directory.
{2}Open the "
server.properties
" file located in the "config" directory using a text editor.
{3} Update the "zookeeper. connect" property to specify the ZooKeeper connection string. For example:
zookeeper.connect=
localhost:2181
Run the following command to start the Kafka broker:
.\bin\windows\kafka-server-start.bat .\config\server.properties
Step 4: Create a Topic
Open a new command prompt and navigate to the Kafka directory.
Run the following command to create a new topic named "test-topic" with a single partition and replication factor of 1:
.\bin\windows\kafka-topics.bat --create --topic test-topic --partitions 1 --replication-factor 1 --bootstrap-server
localhost:9092
Step 5: Produce and Consume Messages
Open two separate command prompts and navigate to the Kafka directory in each.
In one command prompt, run the following command to start a producer and publish messages to the "test topic"
:-
.\bin\windows\kafka-console-producer.bat --topic test-topic --bootstrap-server
localhost:9092
In the other command prompt, run the following command to start a consumer and consume messages from the "test topic":
.\bin\windows\kafka-console-consumer.bat --topic test-topic --from-beginning --bootstrap-server
localhost:9092