top of page
Writer's pictureRafael Natali

How to install Kafka using Docker

Updated: Aug 25, 2023

Install and configure a Docker environment to run Apache Kafka locally


Docker Environment Topology with Apache Zookeeper and Apache Kafka - from author
Docker Environment Topology with Apache Zookeeper and Apache Kafka - from author

Introducing the Apache Kafka ecosystem

Apache Kafka¹ is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Zookeeper² is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. It is an essential part of Kafka.

This tutorial provides the means to execute Kafka in a distributed architecture with a 3-node Broker³ cluster. Also, Zookeeper is configured in Replicated mode⁴ - called ensemble - to take advantage of distributed architecture.

The code used to create this tutorial is available in this repository so, you can follow along.

Containerising Apache Kafka

Prerequisites

In order to run this environment, you’ll need Docker installed and Kafka’s CLI tools.

  1. This tutorial was tested using Docker Desktop⁵ for macOS Engine version 20.10.2.

  2. The CLI tools can be downloaded here as a tar or zip. After the download is complete and the files extracted, you'll need to add it to your PATH environment variable.

  3. If using bash, edit the bash_profile file available at ~/.bash_profile and add the following: PATH=$PATH:~/bin/confluent-6.0.1/bin (Assuming that the files were extracted to the ~/bin location. Any other location will work as long as the path of the PATH environment variable is targeting the right path).

  4. If using zshell, you'll need to edit ~/.zshrc instead.

  5. Update local hosts file. Execute the update-hosts.sh script to update your local /etc/hosts file to resolve the name of the containers.

Docker Image

It was decided to use the Bitnami⁶ images for its ready-to-use approach, well-known reliability, and it’s a non-root container⁷.

Consult Bitnami official documentation for more information:

Docker Compose

A docker-compose file⁸ is provided to create the local development environment.

To spin up the environment execute:

docker-compose -f ./build/docker-compose.yml build

Zookeeper Configuration

The containers only expose the client port accordingly to the container:

zk-1:12181
zk-2:22181
zk-3:32181

Zookeeper Environment Variables

  • ALLOW_ANONYMOUS_LOGIN - Allow to accept connections from unauthenticated users.

  • ZOO_SERVER_ID - ID of the server in the ensemble.

  • ZOO_SERVERS - Comma, space, or semi-colon separated list of servers.

  • ZOO_PORT_NUMBER - ZooKeeper client port

  • ZOO_TICK_TIME: Basic time unit in milliseconds used by ZooKeeper for heartbeats.

  • ZOO_INIT_LIMIT: ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader.

  • ZOO_SYNC_LIMIT: How far out of date a server can be from a leader.

  • ZOO_AUTOPURGE_PURGEINTERVAL: the time interval in hours for which the purge task has to be triggered.

  • ZOO_AUTOPURGE_SNAPRETAINCOUNT: number of most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir to keep.

  • ZOO_MAX_CLIENT_CNXNS: Limits the number of concurrent connections that a single client may make to a single member of the ZooKeeper ensemble.

  • ZOO_HEAP_SIZE: Size in MB for the Java Heap options (Xmx and XMs).

Kafka Configuration The containers only expose the following ports:

  • INTERNAL - port 9092 for intra-cluster communication

  • EXTERNAL - for connection from local computer:

kafka-1: 19093
kafka-2: 29093
kafka-3: 39093

Kafka Broker Environment Variables

  • KAFKA_CFG_ZOOKEEPER_CONNECT: Comma separated host:port pairs, each corresponding to a Zookeeper Server.

  • ALLOW_PLAINTEXT_LISTENER: Allow to use the PLAINTEXT listener.

  • KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: Map between listener names and security protocols.

  • KAFKA_CFG_LISTENERS: Comma-separated list of URIs we will listen on and the listener names.

  • KAFKA_CFG_ADVERTISED_LISTENERS: Listeners to publish to ZooKeeper for clients to use, if different than the listeners config property.

  • KAFKA_INTER_BROKER_LISTENER_NAME: Name of listener used for communication between brokers.

  • KAFKA_CFG_NUM_PARTITIONS: The default number of log partitions per topic

  • KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: Allow automatic topic creation on the broker when subscribing to or assigning a topic.

  • KAFKA_CFG_DEFAULT_REPLICATION_FACTOR: default replication factors for automatically created topics

  • KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR: The replication factor for the offsets topic

  • KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: The replication factor for the transaction topic

  • KAFKA_HEAP_OPTS: Kafka's Java Heap size.

  • KAFKA_CFG_BROKER_ID: Kafka's broker custom id.

Validate Environment

  • Confirm that the containers are up and running:

docker-compose -f build/docker-compose.yml ps

it should return similar information:

Name                    Command               State                                Ports                              
-----------------------------------------------------------------
build_kafka-ui_1   /kafdrop.sh                      Up      0.0.0.0:8080->8080/tcp                                          
kafka-1            /opt/bitnami/scripts/kafka ...   Up      0.0.0.0:19093->19093/tcp, 9092/tcp                              
kafka-2            /opt/bitnami/scripts/kafka ...   Up      0.0.0.0:29093->29093/tcp, 9092/tcp                              
kafka-3            /opt/bitnami/scripts/kafka ...   Up      0.0.0.0:39093->39093/tcp, 9092/tcp                              
zk-1               /opt/bitnami/scripts/zooke ...   Up      0.0.0.0:12181->12181/tcp, 2181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
zk-2               /opt/bitnami/scripts/zooke ...   Up      2181/tcp, 0.0.0.0:22181->22181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
zk-3               /opt/bitnami/scripts/zooke ...   Up      2181/tcp, 2888/tcp, 0.0.0.0:32181->32181/tcp, 3888/tcp, 8080/tcp
  • Certify that communication can be established with the Zookeeper:

zookeeper-shell zk-1:12181 get /zookeeper/config

it should return similar information:

Connecting to zk-1:12181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
server.1=0.0.0.0:2888:3888:participant;0.0.0.0:12181
server.2=zookeeper2:2888:3888:participant;0.0.0.0:12181
server.3=zookeeper3:2888:3888:participant;0.0.0.0:12181
version=0
  • List the Kafka Brokers registered with Zookeeper executing:

zookeeper-shell zk-1:12181 ls /brokers/ids

it should return similar information:

Connecting to zk-1:12181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[1001, 1002, 1003]


Kafdrop

Kafdrop⁹ is a web UI for viewing Kafka topics and browsing consumer groups. The tool displays information such as brokers, topics, partitions, consumers, and lets you view messages.

To use Kafdrop we must access http://localhost:8080. This application is a graphical representation of:

  1. View brokers, all topics and settings

  2. View all data on topics

  3. View messages sent

  4. View consumer groups

  5. Create topics (with few configurations)

Conclusion

This tutorial presented the Docker environment to run a 3-node Apache Kafka environment locally. With this local environment up and running you can begin your adventures in the distributed system world.

I vividly recommend you to take a look at the Apache Kafka Quickstart your next stop and get more hands-on experience.

References










39 views0 comments

Comments


bottom of page