Updated: Aug 22
Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache.
Why Use Kafka?
Disk based persistence
Kafka Use Cases
1. Enterprise messaging system
Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need.
Suitable for both online and offline messaging consumer system.
2. Message Store with playback capability
Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration.
Each message is backed up with distributed file system.
Supports the storage size for 50K to 50 TB.
3. Stream processing
Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window.
Download and Install Kafka
Kafka requires below JRE and Zookeeper. Download and Install the below components.
ZooKeeper : http://zookeeper.apache.org/releases.html
Installation (on Windows)
1. JDK Setup
Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables.
Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened.
Edit the PATH variable and append “;%JAVA_HOME%\bin”
To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed
2. Zookeeper Installation:
Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf)
Rename file "zoo_sample.cfg" to "zoo.cfg".
Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data.
Add entry in System Environment Variables as we did for Java.
Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10
Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin;
You can change the default Zookeeper port in zoo.cfg file (Default port 2181).
Run Zookeeper by opening a new cmd and type zkserver.
3. Kafka Setup:
Go to your Kafka config directory. For me its C:\kafka_2.10-0.10.2.0\config.
Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-0.10.2.0\kafka-logs".
If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port.
Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties.
Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181.
Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup).
If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system.
Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows and type following command and hit enter.
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic
Creating a Producer
· Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows.
· To start a producer type the following command:
kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic
· Again open a new command prompt in the same location as C:\kafka_2.11-0.9.0.0\bin\windows
· Now start a consumer by typing the following command:
kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic
Now you will have two command window
Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt
Some Other Useful Kafka Commands
kafka-topics.bat --list --zookeeper localhost:2181
kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name]
Read messages from beginning:
kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning
kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181
Kafka system has below main component, which are co-ordinated by Zookeeper.
Can be considered like a folder in a file system
Producers published the message to a topic
Message is appended to the topic.
Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number.
For each topic, the Kafka cluster maintains a partitioned log.
Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance.
Each partition has one server which acts as the "leader" and zero or more servers which act as "followers".
Kafka provides ordering of message per partition but not across the partition.
Core component of Kafka messaging system.
Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper.
Kafka cluster consists of one or more broker.
Maintains the replication of partition across the cluster.
Publishes the message to a topic(s).
Messages are appended to one of the topic.
It is one of the user of the Kafka cluster
Kafka maintains the ordering of the message per partition but not the across the partition.
Subscriber of the messages from a topic
One or more consumer can subscriber a topic from different partition, called consumer group.
Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition.
Each consumer maintains the offset for subscribing partition.
A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic
Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key.
A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number)
Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network.
Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput.
Visit this link for Apache Kafka Producer with Example using java
If you have any question please mention in comments section below. Thank you.
[09/07/2019 5:49 PM CST - Reviewed by: PriSin]