View By

Categories

 

Apache Kafka Overview (Windows)

Updated: Aug 22

Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache.


Why Use Kafka?

  • Multiple producers

  • Multiple consumers

  • Disk based persistence

  • Highly scalable

  • High performance

  • Offline messaging

  • Messaging replay



Kafka Use Cases


1. Enterprise messaging system

  • Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need.

  • Suitable for both online and offline messaging consumer system.


2. Message Store with playback capability

  • Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration.

  • Each message is backed up with distributed file system.

  • Supports the storage size for 50K to 50 TB.


3. Stream processing

  • Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window.




Download and Install Kafka


Kafka requires below JRE and Zookeeper. Download and Install the below components.

  1. JRE : http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

  2. ZooKeeper : http://zookeeper.apache.org/releases.html

  3. Kafka : http://kafka.apache.org/downloads.html


Installation (on Windows)


1. JDK Setup

  1. Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables.

  2. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened.

  3. Edit the PATH variable and append “;%JAVA_HOME%\bin”

  4. To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed


2. Zookeeper Installation:

  1. Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf)

  2. Rename file "zoo_sample.cfg" to "zoo.cfg".

  3. Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data.

  4. Add entry in System Environment Variables as we did for Java.

  5. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10

  6. Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin;

  7. You can change the default Zookeeper port in zoo.cfg file (Default port 2181).

  8. Run Zookeeper by opening a new cmd and type zkserver.


3. Kafka Setup:

  1. Go to your Kafka config directory. For me its C:\kafka_2.10-0.10.2.0\config.

  2. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-0.10.2.0\kafka-logs".

  3. If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port.

  4. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties.

  5. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181.



Testing Kafka

Creating Topics

  • Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup).

  • If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system.

  • Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows and type following command and hit enter.

kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic


Creating a Producer

  • · Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows.

  • · To start a producer type the following command:

kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic


Start Consumer

  • · Again open a new command prompt in the same location as C:\kafka_2.11-0.9.0.0\bin\windows

  • · Now start a consumer by typing the following command:

kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic

  • Now you will have two command window

  • Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt



Some Other Useful Kafka Commands

  • List Topics:

kafka-topics.bat --list --zookeeper localhost:2181

  • Describe Topic

kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name]

  • Read messages from beginning:

kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning

  • Delete Topic

kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181



Kafka Architecture


Kafka system has below main component, which are co-ordinated by Zookeeper.

  1. Topic

  2. Broker

  3. Producers

  4. Consumers



1. Topic

  • Can be considered like a folder in a file system

  • Producers published the message to a topic

  • Message is appended to the topic.

  • Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number.

  • For each topic, the Kafka cluster maintains a partitioned log.

  • Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance.

  • Each partition has one server which acts as the "leader" and zero or more servers which act as "followers".

  • Kafka provides ordering of message per partition but not across the partition.



Topic Replication

2. Broker

  • Core component of Kafka messaging system.

  • Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper.

  • Kafka cluster consists of one or more broker.

  • Maintains the replication of partition across the cluster.


3. Producers

  • Publishes the message to a topic(s).

  • Messages are appended to one of the topic.

  • It is one of the user of the Kafka cluster

  • Kafka maintains the ordering of the message per partition but not the across the partition.


4. Consumers

  • Subscriber of the messages from a topic

  • One or more consumer can subscriber a topic from different partition, called consumer group.

  • Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition.

  • Each consumer maintains the offset for subscribing partition.

  • A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic

5. Message

  • Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key.

  • A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number)

  • Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network.

  • Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput.


Visit this link for Apache Kafka Producer with Example using java


If you have any question please mention in comments section below. Thank you.


#KafkaOverview #ApacheKafkaWindows #KafkaZookeeperInstallation #KafkaUseCases #KafkaCommands


[09/07/2019 5:49 PM CST - Reviewed by: PriSin]

199 views3 comments

Help others, write your first blog today! 

Home   |   Contact Us

©2020 by Data Nebulae