How to Install Apache Kafka on Ubuntu 20.04

Ubuntu 20.04 is a robust operating system that is the basis for deploying many complex infrastructures such as smooth transmission and fast and efficient processing of data streams. Today you will learn how to deploy one because in this post you will learn how to install Apache Kafka on Ubuntu 20.04.

Apache Kafka software is an open-source cross-platform application developed by Apache Software Foundation and specialized in stream processing. It allows you to publish, store, process, and subscribe to log streams in real-time. It is designed to handle data streams from various sources and distribute them to various users.

Apache Kafka is the alternative to a traditional enterprise messaging system. It started as an internal system that LinkedIn developed to handle 1.4 billion messages per day.

This platform has started to gain popularity thanks to large companies such as Netflix and Microsoft using it in their architectures. Kafka is written in Java and Scala, so it has to be present in the system to run.

Install Apache Kafka on Ubuntu 20.04

Apache Kafka is built with Java so we have to install it before proceeding with any steps.

So, open a terminal or connect to your server via SSH and update Ubuntu

sudo apt update

sudo apt upgrade

Now install Java on Ubuntu.

sudo apt install default-jdk default-jre

The next step is to add a new user to the system so that Kafka can be managed by it.

sudo adduser kafka

The user you created has to be added to the sudo group so that you have sufficient permissions to run the program.

sudo adduser kafka sudo

Now that the kafka user is created and ready, you can log in using the su command

su -l kafka

Downloading and installing Apache Kafka

Create a new folder for you to download the program. I will call it kafka but you can choose another name.

mkdir kafka

Now access it and from there with the help of the wget command, you can download the latest stable version of the program.

cd kafka
wget https://downloads.apache.org/kafka/2.7.0/kafka_2.13-2.7.0.tgz

Sample Output:

--2021-04-15 23:13:07--  https://downloads.apache.org/kafka/2.7.0/kafka_2.13-2.7.0.tgz
Resolving downloads.apache.org (downloads.apache.org)... 2a01:4f8:10a:201a::2, 88.99.95.219
Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f8:10a:201a::2|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68583422 (65M) [application/x-gzip]
Saving to: ‘kafka_2.13-2.7.0.tgz’

kafka_2.13-2.7.0.tgz                       100%[=====================================================================================>]  65.41M  3.08MB/s    in 20s     

2021-04-15 23:13:27 (3.21 MB/s) - ‘kafka_2.13-2.7.0.tgz’ saved [68583422/68583422]

After that unzip it using the command tar.

tar -xvzf kafka_2.13-2.7.0.tgz --strip 1

We now have the binary correctly on the system. So we will have to do some configuration before we can use it.

Configuring Apache Kafka before using it

By default, Apache Kafka will not allow you to delete a topic. In this chaos, a topic can be a category, group, or feed name that can be published in a message. So it is a good idea to change this.

To do this, open the server.properties file inside the config folder

nano config/server.properties

And locate the delete.topic.enable directive and set it to true.

delete.topic.enable = true

In this same file, you can change the folder where Apache Kafka saves the logs that are generated.

log.dirs=/home/kafka/logs

In this case, the logs folder should be in the same directory as the home directory.

Another configuration we have to do is to create a service to manage Kafka as if it were a system service. This will make it easier to start it, stop it and check its status.

However, we have to start with Zookeeper which is a service with which Kafka manages cluster configurations and status.

To do this, create a new file for Zookeeper in the directory where the services are hosted.

sudo nano /etc/systemd/system/zookeeper.service

And add the following

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save the changes and close the editor.

Now do the same for kafka.

sudo nano /etc/systemd/system/kafka.service

And add the following:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Again, save the changes and close the editor.

To apply the changes, just refresh the system daemon list.

sudo systemctl daemon-reload

And start the Kafka and Zookeeper services.

sudo systemctl start kafka
sudo systemctl enable kafka
sudo systemctl enable zookeeper
sudo systemctl start zookeeper

This will complete the installation.

Conclusion

Apache Kafka is a professional open-source solution for large companies that need effective data transmission. Being open-source gives us a reference of how powerful and manageable it is.

So, share this post and leave us a comment.

Kafka Website

Install Apache Kafka on Ubuntu 20.04

Downloading and installing Apache Kafka

Configuring Apache Kafka before using it

Conclusion

Related Posts