top of page
BlogPageTop

Trending

Apache Kafka Overview (Windows)

Updated: Nov 18, 2022

Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache.


Why Use Kafka?

  • Multiple producers

  • Multiple consumers

  • Disk based persistence

  • Highly scalable

  • High performance

  • Offline messaging

  • Messaging replay


 

Kafka Use Cases


1. Enterprise messaging system

  • Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need.

  • Suitable for both online and offline messaging consumer system.


2. Message Store with playback capability

  • Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration.

  • Each message is backed up with distributed file system.

  • Supports the storage size for 50K to 50 TB.


3. Stream processing

  • Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window.



 

Download and Install Kafka


Kafka requires below JRE and Zookeeper. Download and Install the below components.


Installation (on Windows)


1. JDK Setup

  1. Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables.

  2. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened.

  3. Edit the PATH variable and append “;%JAVA_HOME%\bin”

  4. To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed


2. Zookeeper Installation:

  1. Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf)

  2. Rename file "zoo_sample.cfg" to "zoo.cfg".

  3. Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data.

  4. Add entry in System Environment Variables as we did for Java.

  5. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10

  6. Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin;

  7. You can change the default Zookeeper port in zoo.cfg file (Default port 2181).

  8. Run Zookeeper by opening a new cmd and type zkserver.


3. Kafka Setup:

  1. Go to your Kafka config directory. For me its C:\kafka_2.10-0.10.2.0\config.

  2. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-0.10.2.0\kafka-logs".

  3. If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port.

  4. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties.

  5. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181.


 

Testing Kafka

Creating Topics

  • Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup).

  • If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system.

  • Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows and type following command and hit enter.

kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic


Creating a Producer

  • · Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows.

  • · To start a producer type the following command:

kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic


Start Consumer

  • · Again open a new command prompt in the same location as C:\kafka_2.11-0.9.0.0\bin\windows

  • · Now start a consumer by typing the following command:

kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic

  • Now you will have two command window

  • Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt


 

Some Other Useful Kafka Commands

  • List Topics:

kafka-topics.bat --list --zookeeper localhost:2181

  • Describe Topic

kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name]

  • Read messages from beginning:

kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning

  • Delete Topic

kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181


 

Kafka Architecture


Kafka system has below main component, which are co-ordinated by Zookeeper.

  1. Topic

  2. Broker

  3. Producers

  4. Consumers



1. Topic

  • Can be considered like a folder in a file system

  • Producers published the message to a topic

  • Message is appended to the topic.

  • Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number.

  • For each topic, the Kafka cluster maintains a partitioned log.

  • Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance.

  • Each partition has one server which acts as the "leader" and zero or more servers which act as "followers".

  • Kafka provides ordering of message per partition but not across the partition.



Topic Replication

2. Broker

  • Core component of Kafka messaging system.

  • Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper.

  • Kafka cluster consists of one or more broker.

  • Maintains the replication of partition across the cluster.


3. Producers

  • Publishes the message to a topic(s).

  • Messages are appended to one of the topic.

  • It is one of the user of the Kafka cluster

  • Kafka maintains the ordering of the message per partition but not the across the partition.


4. Consumers

  • Subscriber of the messages from a topic

  • One or more consumer can subscriber a topic from different partition, called consumer group.

  • Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition.

  • Each consumer maintains the offset for subscribing partition.

  • A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic

5. Message

  • Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key.

  • A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number)

  • Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network.

  • Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput.


Visit this link for Apache Kafka Producer with Example using java


If you have any question please mention in comments section below. Thank you.



[09/07/2019 5:49 PM CST - Reviewed by: PriSin]

ADVERTISEMENT

Want to share your thoughts about this blog?

Disclaimer: Please note that the information provided on this website is for general informational purposes only and should not be taken as legal advice. Dataneb is a platform for individuals to share their personal experiences with visa and immigration processes, and their views and opinions may not necessarily reflect those of the website owners or administrators. While we strive to keep the information up-to-date and accurate, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk. We strongly advise that you consult with a qualified immigration attorney or official government agencies for any specific questions or concerns related to your individual situation. We are not responsible for any losses, damages, or legal disputes arising from the use of information provided on this website. By using this website, you acknowledge and agree to the above disclaimer and Google's Terms of Use (https://policies.google.com/terms) and Privacy Policy (https://policies.google.com/privacy).

RECOMMENDED FROM DATANEB

Struggle2.png

Create SSIS package in Visual Studio 2017

In this tutorial, you will learn how to create an SSIS (SQL Server Integration Services) package in Visual Studio 2017 step by step. For...

Feb 23, 2024

Struggle2.png

Apache Spark Tutorial Scala: A Beginners Guide to Apach...

Learn Apache Spark: Tutorial for Beginners - This Apache Spark tutorial documentation will introduce you to Apache Spark programming..

Nov 26, 2022

Struggle2.png

How to Pull Data from Oracle IDCS (Identity Cloud Servi...

Oracle IDCS has various rest APIs that can be used to pull data and you can utilize it further for data analytics. Let's see how we can...

Mar 24, 2024

bottom of page