top of page

Results found for ""

  • Kafka Consumer Advance (Java example)

    Prerequisite Kafka Overview Kafka Producer & Consumer Commits and Offset in Kafka Consumer Once client commits the message, Kafka marks the message "deleted" for the consumer and hence the read message would be available in next poll by the client. Properties used in the below example bootstrap.servers=localhost:9092 ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 HQ_TOPIC_NAME=EK.TWEETS.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 auto.offset.reset=earliest enable.auto.commit=false Configuration Level Setting This can be done at configuration level in the properties files. auto.commit.offset=false - This is the default setting. Means the consumer API can take the decision to retail the message of the offset or commit it. auto.commit.offset=true - Once the message is consumed by the consumer, the offset is committed if consumer API is not taking any decision in client code. Consumer API Level Setting Synchronous Commit Offset is committed as soon consumer API confirms. The latest Offset of the message is committed. Below example is committing the message after processing all messages of the current polling. Synchronous commit blocks until the broker responds to the commit request. Sample Code public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(10000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Asynchronous Commit The consumer does not wait for the the response from the broker This commits just confirms the broker and continue its processing. Throughput is more in compare to Synchronous commit. There could be chances of duplicate read, that application need to handle its own. Sample code while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of messaged polled by consumer "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitAsync(new OffsetCommitCallback() { public void onComplete(Map offsets, Exception exception) { if (exception != null){ System.out.printf("Commit failed for offsets {}", offsets, exception); }else{ System.out.println("Messages are Committed Asynchronously..."); } }}); } Offset Level Commit Sometime application may need to commit the offset on read of particular offset. Sample Code Map currentOffsets =new HashMap(); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); if(record.offset()==18098){ consumer.commitAsync(currentOffsets, null); } } } Retention of Message Kafka retains the message till the retention period defined in the configuration. It can be defined at broker level or at topic level. Retention of message can be on time basis or byte basis for the topic. Retention defined on Topic level override the retention defined at broker level. retention.bytes - The amount of messages, in bytes, to retain for this topic. retention.ms - How long messages should be retained for this topic, in milliseconds. 1. Defining retention at topic level Retention for the topic named “test-topic” to 1 hour (3,600,000 ms): # kafka-configs.sh --zookeeper localhost:2181/kafka-cluster --alter --entity-type topics --entity-name test-topic --add-config retention.ms=3600000 2. Defining retention at broker level Define one of the below properties in server.properties # Configures retention time in milliseconds => log.retention.ms=1680000 # Configures retention time in minutes => log.retention.minutes=1680 # Configures retention time in hours => log.retention.hours=168 Fetching Message From A Specific Offset Consumer can go down before committing the message and subsequently there can be message loss. Since Kafka broker has capability to retain the message for long time. Consumer can point to specific offset to get the message. Consumer can go back from current offset to particular offset or can start polling the message from beginning. Sample Code Map currentOffsets =new HashMap(); public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } TopicPartition topicPartition = new TopicPartition(getTopicName(), 0); List topics = Arrays.asList(topicPartition); consumer.assign(topics); consumer.seekToEnd(topics); long current = consumer.position(topicPartition); consumer.seek(topicPartition, current-10); System.out.println("Topic partitions are "+consumer.assignment()); while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of record polled "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); } consumer.commitAsync(currentOffsets, null); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Thank you. If you have any doubt please feel free to post your questions in comments section below. [23/09/2019 04:38 PM CST - Reviewed by: PriSin]

  • Apache Spark Interview Questions

    This post include Big Data Spark Interview Questions and Answers for experienced and beginners. If you are a beginner don't worry, answers are explained in detail. These are very frequently asked Data Engineer Interview Questions which will help you to crack big data job interview. What is Apache Spark? According to Spark documentation, Apache Spark is a fast and general-purpose in-memory cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. In simple terms, Spark is a distributed data processing engine which supports programming language like Java, Scala, Python and R. In core, Spark engine has four built-in libraries which supports Spark SQL, Machine Learning, Spark Streaming and GraphX. What is Apache Spark used for? Apache Spark is used for real time data processing. Implementing Extract, Transform, Load (ETL) processes. Implementing machine learning algorithms and create interactive dashboards for data analytics. Apache Spark is also used to store petabytes of data with data distributed over cluster with thousands of nodes. How does Apache Spark work? Spark uses master-slave architecture to distribute data across worker nodes and process them in parallel. Just like mapreduce, Spark has a central coordinator called driver and rest worker nodes as executors. Driver communicates with the executors to process the data. Why is Spark faster than Hadoop mapreduce? One of the drawbacks of Hadoop mapreduce is that it holds full data into HDFS after running each mapper and reducer job. This is very expensive because it consumes lot of disk I/O and network I/O. While in Spark, there are two processes transformations and actions. Spark doesn't write or hold the data in memory until an action is called. Thus, it reduces disk I/O and network I/O. Another innovation is in-memory caching where you can instruct Spark to hold input data in-memory so that program doesn't have to read data again from disk, thus reducing disk I/O. Is Hadoop required for spark? No, Hadoop file system is not required for Spark. However for better performance, Spark can use HDFS-YARN if required. Is Spark built on top of Hadoop? No. Spark is totally independent of Hadoop. What is Spark API? Apache Spark has basically three sets of APIs (Application Program Interface) - RDDs, Datasets and DataFrames that allow developers to access the data and run various functions across four different languages - Java, Scala, Python and R. What is Spark RDD? Resilient Distributed Datasets (RDDs) are basically an immutable collection of elements which is used as fundamental data structure in Apache Spark. These are logically partitioned data across thousands of nodes in your cluster that can be accessed and computed in parallel. RDD was the primary Spark API since Apache Spark foundation. Which are the methods to create RDD in spark? There are mainly two methods to create RDD. Parallelizing - sc.parallelize() Reference external dataset - sc.textFile() Read - Spark context parallelize and reference external dataset example. When would you use Spark RDD? RDDs are used for unstructured data like streams of media texts, when schema and columnar format of data is not mandatory requirement like accessing data by column name and any other tabular attributes. Secondly, RRDs are used when you want full control over physical distribution of data. What is SparkContext, SQLContext, SparkSession and SparkConf? SparkContext tells Spark driver application whether to access the cluster through a resource manager or to run locally in standalone mode. The resource manager can be YARN, or Spark's cluster manager. SparkConf stores configuration parameters that Spark driver application passes to SparkContext. These parameters define properties of Spark driver application which is used by Spark to allocate resources on the cluster. Such as the number, memory size and cores used by the executors running on the worker nodes. SQLContext is a class which is used to implement Spark SQL. You need to create SparkConf and SparkContext first in order to implement SQLContext. It is basically used for structured data when you want to implement schema and run SQL. All three - SparkContext, SparkConf and SQLContext are encapsulated within SparkSession. In newer version you can directly implement Spark SQL with SparkSession. What is Spark checkpointing? Spark checkpointing is a process that saves the actual intermediate RDD data to a reliable distributed file system. It's the process of saving intermediate stage of a RDD lineage. You can do it by calling checkpoint, RDD.checkpoint() while developing the Spark driver application. You need to set up checkpoint directory where Spark can store these intermediate RDDs by calling RDD.setCheckpointDir(). What is an action in Spark and what happens when it's executed? Action triggers execution of RDD lineage graph, loads original data from disk to create intermediate RDDs, performs all transformations and returns the final output to the Spark driver program or writes the data to file system (based on the type of action). According to Spark documentation, following are the list of actions. What is Spark Streaming? Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams. Reference: Apache Spark documentation Login to see more; What is difference between RDDs, DataFrame and dataset? Why is spark RDD immutable? Are spark DataFrames immutable? Are spark DataFrames distributed? What is Spark stage? How does SQL spark work? What is spark executor and how does it work? How will you improve Apache Spark performance? What is spark SQL Warehouse Dir? What is Spark shell? How would you open and close it? How will you clear the screen on spark shell? What is parallelize in spark? Does spark SQL require hive? What is Metastore in hive? What does repartition and coalesce do in spark? What is spark mapPartitions? What is difference between MAP and flatMap in spark? What is spark reduceByKey? What is lazy evaluation in spark? What is accumulator in spark? Can RDD be shared between SparkContexts? What happens if RDD partition is lost due to worker node failure? Which serialization libraries are supported in spark? What is cluster manager in spark? Questions? Feel free to write in comments section below. Thank you.

  • Apache Avro Schema Example (in Java)

    Introduction Avro provides data serialization based on JSON Schema. It is language neutral data serialization system, means a language A can serialize and languages B can de-serialize and use it. Avro supports both dynamic and static types as per requirement. It supports many languages like Java,C, C++, C#, Python and Ruby. Benefits Producers and consumers are decoupled from their change in application. Schemas help future proof your data and make it more robust. Supports and used in all use cases in streaming specially in Kafka. Avro are compact and fast for streaming. Supports for schema registry in case of Kafka. Steps to Serialize Object Create JSON schema. Compile the schema in the application. Populate the schema with data. Serialize data using Avro serializer. Steps to Deserialize Object Use Apache Avro api to read the serialized file. Populate the schema from file. Use the object for application. Sample Example for Avro (in Java) Step-1: Create a Java project and add the dependencies as below. Step-2: Create a Schema file as below: Customer_v0.avsc { "namespace": "com.demo.avro", "type": "record", "name": "Customer", "fields": [ { "name": "id", "type": "int" }, { "name": "name", "type": "string" }, { "name": "faxNumber", "type": [ "null", "string" ], "default": "null" } ] } Step-3: Compile the schema. java -jar lib\avro-tools-1.8.1.jar compile schema schema\Customer_v0.avsc schema Step-4: Put the java generated file to the source directory of the project as shown in project structure. Step-5: Create the Producer.java package com.demo.producer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileWriter; import org.apache.avro.io.DatumWriter; import org.apache.avro.specific.SpecificDatumWriter; import com.demo.avro.Customer; public class Producer { public static void main(String[] args)throws IOException { serailizeMessage(); } public static void serailizeMessage()throws IOException{ DatumWriter datumWriter = new SpecificDatumWriter(Customer.class); DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); File file = new File("customer.avro"); Customer customer=new Customer(); dataFileWriter.create(customer.getSchema(), file); customer.setId(1001); customer.setName("Customer -1"); customer.setFaxNumber("284747384343333".subSequence(0, 10)); dataFileWriter.append(customer); customer=new Customer(); customer.setId(1002); customer.setName("Customer -2"); customer.setFaxNumber("45454747384343333".subSequence(0, 10)); dataFileWriter.append(customer); dataFileWriter.close(); } } Step-6: Create the Consumer.java package com.demo.consumer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileReader; import org.apache.avro.io.DatumReader; import org.apache.avro.specific.SpecificDatumReader; import com.demo.avro.Customer; public class Consumer { public static void main(String[] args)throws IOException { deSerailizeMessage(); } public static void deSerailizeMessage()throws IOException{ File file = new File("customer.avro"); DatumReader datumReader = new SpecificDatumReader(Customer.class); DataFileReader dataFileReader= new DataFileReader(file,datumReader); Customer customer=null; while(dataFileReader.hasNext()){ customer=dataFileReader.next(customer); System.out.println(customer); } } } Step-7: Run Producer.java It creates customer.avro file and puts the customer in Avro format. Step-8: Run Consumer.java It reads the customer.avro file and get the customer records. Thank you! If you have any question please mention in comments section below. [12/09/2019 10:38 PM CST - Reviewed by: PriSin]

  • Kafka Producer and Consumer example (in Java)

    In this Kafka pub sub example you will learn, Kafka producer components (producer api, serializer and partition strategy) Kafka producer architecture Kafka producer send method (fire and forget, sync and async types) Kafka producer config (connection properties) example Kafka producer example Kafka consumer example Prerequisite - refer my previous post on Apache Kafka Overview (Windows). Apache Kafka is one of the client for Kafka broker. It publishes message to kafka topic. Messages are serialized before transferring it over the network. Kafka Producer Components Producers API Kafka provides a collection of producer APIs to publish the message to the topic. The messaging can be optimize by setting different parameters. Developer can decide to publish the message on a particular partition by providing custom key for the message. Serializer Serializer serializes the message to pass over the network. Default or custom serializer can be set by developer to serialize the message. Below are the String Serializer. value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer The serialize are set for the value for the key and value both. Partitioner This component apply the hashing algorithm and finds the partition for the message, if keys are provided. If Key for the message is not provided by developer then it uses the round-robin algorithm to assign the the topic for the message. Kafka Producer Send Methods Fire and Forget Producer does not care for the message arrives at destination or not. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data); Synchronous Send Send() method returns future object and developer can use the get() method on future object to know the status of message. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data).get(); Asynchronous Send Developers can use the send() with a callback function which is called once broker send the response back to the producer. TestCallback callback = new TestCallback(); ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data, callback); private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } Producer Configuration bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Kafka Producer Example Step-1: Start Zookeeper Step-2: Start Kafka broker and create a topic TEST.TOPIC Step-3: Create a Java project Step-4: Create a properties file - kconnection.properties bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 TOPIC_NAME=TEST.TOPIC batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Step-5: KafkaConnection.java package com.demo.twitter.util; import java.io.InputStream; import java.util.HashMap; import java.util.Map; import java.util.Properties; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; public class KafkaConnection { static Properties props=null; private static Properties loadConPropsFromClasspath() throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream("kconnection.properties"); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public static Producer getKafkaConnection()throws Exception{ Properties props=loadConPropsFromClasspath(); Producer producer = new KafkaProducer(props); return producer; } public static String getTopicName() throws Exception{ if(props!=null){ return props.getProperty(IKafkaSourceConstant.TOPIC_NAME); }else{ return null; } } } Step-6: KafkaProducerClient.java package com.demo.client.producer; import org.apache.kafka.clients.producer.Callback; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import com.demo.twitter.util.KafkaConnection; public class KafkaProducerClient { static KafkaProducer producer=null; /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { try{ KafkaProducerClient pclient=new KafkaProducerClient(); long i = 1; for (; i <=10 ; i++) { KafkaProducerClient.sendMessage(""+i, "Hello This is test message ..Demo"+i); } System.out.println("Number of message sent "+(i-1)); pclient.closeProducer(); }catch(Exception e){ e.printStackTrace(); } } public static void sendMessage(String key,String message)throws Exception{ try{ if(producer==null){ producer =(KafkaProducer) KafkaConnection.getKafkaConnection(); System.out.println("Kafka Connection created for topic.. demo"+KafkaConnection.getTopicName()); } TestCallback callback = new TestCallback(); long startTime=System.currentTimeMillis(); ProducerRecord data = new ProducerRecord(KafkaConnection.getTopicName(), key, message ); producer.send(data, callback); System.out.println("Total Time:---> "+Long.valueOf(System.currentTimeMillis()-startTime)); }catch(Exception e){ e.printStackTrace(); producer.close(); } } public void closeProducer(){ try{ producer.close(); }catch(Exception e){ e.printStackTrace(); } } private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } } Apache Kafka Consumer Example Continue in the same project. Step-1: Create a properties file: kconsumer.properties with below contents bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 TOPIC_NAME=TEST.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 Step-2: Create KafkaConsumerClient.java package com.demo.kafka.consumer; import java.io.InputStream; import java.util.Collections; import java.util.Properties; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import com.demo.twitter.util.KafkaConnection; public class KafkaConsumerClient { Properties props=null; KafkaConsumer consumer =null; public static void main(String[] args) { KafkaConsumerClient conClient=new KafkaConsumerClient(); try { conClient.subscribeMessage("kconsumer.properties"); } catch (Exception e) { e.printStackTrace(); } } public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ //Common for below two approach if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } public KafkaConsumer getKafkaConnection(String fileName)throws Exception{ if(props==null){ props=loadConPropsFromClasspath(fileName); System.out.println(props); } KafkaConsumer consumer = new KafkaConsumer(props); return consumer; } private Properties loadConPropsFromClasspath(String fileName) throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream(fileName); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public String getTopicName() throws Exception{ if(props!=null){ return props.getProperty("TOPIC_NAME"); }else{ return null; } } } Thank you. If you have any question please write in comments section below. [09/09/2019 10:38 PM CST - Reviewed by: PriSin]

  • Apache Kafka Overview (Windows)

    Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache. Why Use Kafka? Multiple producers Multiple consumers Disk based persistence Highly scalable High performance Offline messaging Messaging replay Kafka Use Cases 1. Enterprise messaging system Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need. Suitable for both online and offline messaging consumer system. 2. Message Store with playback capability Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration. Each message is backed up with distributed file system. Supports the storage size for 50K to 50 TB. 3. Stream processing Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window. Download and Install Kafka Kafka requires below JRE and Zookeeper. Download and Install the below components. JRE : http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html ZooKeeper : http://zookeeper.apache.org/releases.html Kafka : http://kafka.apache.org/downloads.html Installation (on Windows) 1. JDK Setup Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened. Edit the PATH variable and append “;%JAVA_HOME%\bin” To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed 2. Zookeeper Installation: Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf) Rename file "zoo_sample.cfg" to "zoo.cfg". Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data. Add entry in System Environment Variables as we did for Java. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10 Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin; You can change the default Zookeeper port in zoo.cfg file (Default port 2181). Run Zookeeper by opening a new cmd and type zkserver. 3. Kafka Setup: Go to your Kafka config directory. For me its C:\kafka_2.10-0.10.2.0\config. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-0.10.2.0\kafka-logs". If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181. Testing Kafka Creating Topics Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup). If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system. Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows and type following command and hit enter. kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic Creating a Producer · Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows. · To start a producer type the following command: kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic Start Consumer · Again open a new command prompt in the same location as C:\kafka_2.11-0.9.0.0\bin\windows · Now start a consumer by typing the following command: kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic Now you will have two command window Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt Some Other Useful Kafka Commands List Topics: kafka-topics.bat --list --zookeeper localhost:2181 Describe Topic kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name] Read messages from beginning: kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning Delete Topic kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181 Kafka Architecture Kafka system has below main component, which are co-ordinated by Zookeeper. Topic Broker Producers Consumers 1. Topic Can be considered like a folder in a file system Producers published the message to a topic Message is appended to the topic. Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number. For each topic, the Kafka cluster maintains a partitioned log. Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance. Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". Kafka provides ordering of message per partition but not across the partition. 2. Broker Core component of Kafka messaging system. Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper. Kafka cluster consists of one or more broker. Maintains the replication of partition across the cluster. 3. Producers Publishes the message to a topic(s). Messages are appended to one of the topic. It is one of the user of the Kafka cluster Kafka maintains the ordering of the message per partition but not the across the partition. 4. Consumers Subscriber of the messages from a topic One or more consumer can subscriber a topic from different partition, called consumer group. Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition. Each consumer maintains the offset for subscribing partition. A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic 5. Message Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key. A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number) Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network. Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput. Visit this link for Apache Kafka Producer with Example using java If you have any question please mention in comments section below. Thank you. #KafkaOverview #ApacheKafkaWindows #KafkaZookeeperInstallation #KafkaUseCases #KafkaCommands [09/07/2019 5:49 PM CST - Reviewed by: PriSin]

  • $74 Billion in Lost Bitcoin - Modern Day Buried Treasure

    In 2018, the digital forensics firm Chainalysis has estimated that around 35% of all bitcoin is likely lost. This is approximately 7.4 million of the 21 million bitcoin that will ever exist. Which this month (August 2019) has been averaging at a market value of $10K per bitcoin. Extrapolating, this is over $74 billion worth of bitcoin. Losing Bitcoin To be more exact, the bitcoin aren't lost but frozen. Bitcoin are inaccessible to their owners without a private key, a 256-bit number, usually saved in a wallet file. It is these keys, stored on hard drives or flash drives, that people misplace/lose/throw away/erase. One of the most infamous stories involves James Howell, a British man who owned upwards of 7,500 bitcoins. Howell accidentally threw out the hard drive containing his wallet file, thus losing all access to his Bitcoin. At the time he lost it, the digital currency were valued at $7.5 million. The fortune is currently buried at his local landfill in Newport, South Wales. And, despite his desire to search, his local city council officials won't allow him due to safety concerns. These 7,500 bitcoins are now estimated to be worth $75 million. Similarly, Campbell Simpson threw away a hard drive containing 1,400 bitcoin, currently valued to be worth $14 million. Even Elon Musk seems to have lost his bitcoin. To be clear, we aren't discussing how people lose bitcoin through careless/frivolous spending or outright theft. Those bitcoin have just been transacted and are under different ownership but they are still "active" coins. The bitcoin we're discussing has become forsaken, abandoned and inaccessible, buried forever in an online vault. Forgotten pins, key holders who have passed away, overwritten USB sticks; there are countless ways people have lost their bitcoin but what they have in common is lost passwords/keys. Bitcoin Protection Your bitcoin is protected with a 256-bit key which has 2^256 possible combinations or 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible combinations. It's been estimated that even the fastest supercomputer in the world (Tianhe-2) would take millions of years to crack 256-bit encryption. Bitcoin Recovery The US Treasury Department replaces and redeems an estimated $30 million in mutilated, damaged, and burned currency every year. However, there is no one that can really help you if you lose your bitcoin password since there's no way to reset the password. Besides dumpster diving, some desperate investors have even resorted to hypnosis in an attempt to conjure up their keys. However, one company offers a glimmer of hope to those who lost their keys. Wallet Recovery Services, managed by Dave Bitcoin (an alias), claims a 30% success rate in hacking passwords by brute force decryption and charges his clients 20% of the amount in the wallet. Good luck to all you investors and keep your wallets safe~

  • Frequently asked Informatica interview questions and answers 2019

    What is Informatica and Enterprise Data Warehouse? Informatica is an ETL tool which is used to extract, transform and load data. It plays crucial role in building Enterprise Data Warehouse. EDW provides user a global view on historical data stored from various department in an organization like finance, sales, labor, marketing etc based on which critical business decisions could be made. What is repository service, integration service, reporting service, domain and node? There are basically three types of services which you will find under application services: Repository service - It is responsible for storing Informatica metadata (source definition, target definition, mappings, mapplets and transformations in the form of tables) and providing access to these repositories for other services. It is also responsible for maintaining connection between client development tool and Informatica repository which keeps metadata up-to-date by updating, inserting and deleting the metadata. Integration Service - It is responsible for execution of workflow and end-to-end data movement from source to target. It pulls the workflow information from repository, starts the execution of tasks, gathers/combines data from different sources (for example relational database and flat file) and loads into single or multiple targets. Reporting Service - It enables report generation. Domain - It is administrative part of Informatica which is used by admins to operate nodes and services. You can define multiple nodes in a domain with a gateway node which is responsible for receiving request and distributing it to worker nodes. Further, nodes are responsible for running services and other Informatica processes. There are basically two types of services in a domain - Service manager and Application services. Service manager is responsible for logging and login operations. Like authentication, authorization, managing people and groups. Application services are responsible for managing integration services, repository services and reporting services. Key properties of domain are as follows: Database properties - You can define database instance name and port responsible for holding domain. It consist of database type (like Oracle, SQL server etc), host, port, database name and user name. General Properties - You can define resilience timeout, restart period, restart attempts, dispatch mode etc. For example if services goes down how many seconds application services wait to again connect to respective service depends upon resilience timeout, how long Informatica can try restarting those services depends upon restart period and attempts. How tasks will be distributed to worker nodes from gateway will depend upon dispatch mode like round robin. What is PowerCenter repository? It is like a relational database which stores Informatica metadata in the form of tables (underlying database could be Oracle database or SQL server or similar) and it is managed by repository services. What is Informatica client tool? Informatica client tool is basically developer tool installed on client machine and it consist of four parts: Designer (Source, target, mapping, mapplet and transformations designer) Workflow manager (Task, worklet and workflow designer) Monitor Repository manager Basic terminology consist of: Workflow, Worklet, Sessions & Tasks: Workflow consists of one or more session, worklet and task (includes timer, decision, command, event wait, mail, link, assignment, control etc) connected in parallel or sequence. You can run these sessions by using session manager or pmcmd command. Further, you can write pre-post-session commands. Mappings, Mapplets & Transformations: Mappings are collection of source, target and transformations. Mapplets (designed in mapplet designer) are like re-usable mappings which contains various transformations but no source/target. You can run the mapping in debug mode without creating a session. There are mapping parameters and variables as well. Mapping parameters represent constant values that are defined before running a session while mapping variables can change values during sessions. What could be the various states of object in Informatica? Valid - fully syntactically correct object. Invalid - where syntax and properties are invalid according to Informatica standards. Informatica marks those objects invalid. It could be mapping, mapplet, task, session, workflow or any transformation. Impacted - where underlying object is invalid. For instance in a mapping suppose underlying transformation has become invalid due to some change. What happens when user executes the workflow? User executes workflow Informatica invokes integration service to pull workflow details from repository Integration service starts execution of workflow after gathering workflow metadata Integration service runs all child tasks Reads and combine data from sources and loads into target After execution, it updates the status of task as succeeded, failed, unknown or aborted Workflow and session log is generated What is the difference between ABORT and STOP? Abort will kill process after 60 seconds even if data processing hasn't finished. Session will be forcefully terminated and DTM (Data Transformation Manager) process will be killed. Stop will end the process once DTM processing has finished processing. Although it stops reading data from source as soon as stop command is received. What are the various types of transformation available in Informatica? There are basically two categories of transformation in Informatica. Active - It can change the number of rows that pass through transformation for example filter, sorter transformations. It can also change the row type for example update strategy transformation which can mark row for update, insert, reject or delete. Further it can also change the transaction boundary for example transaction control transformation which can allow commit and rollback for each row based on certain expression evaluation. Passive - Maintains same number of rows, no change in row type and transaction boundary. Number of output rows will remain always same as number of input rows. Active transformation includes: Source Qualifier: Connected to relational source or flat file, converts source data type to Informatica native data types. Performs joins, filter, sort, distinct and you can write custom SQL. You can have multiple source qualifier in single session with multiple targets, in this case you can decide target load order as well. Joiner: It can join two heterogeneous sources unlike source qualifier which needs common source. It performs normal join, master outer join, detail outer join and full outer join. Filter: It has single condition - drops records based on filter criteria like SQL where clause, single input and single output. Router: It has input, output and default group - acts like filter but you can apply multiple conditions for each group like SQL case statement, single input multiple output, more easier and efficient to work as compared to filter. Aggregator: It performs calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups. It also provides extra cache files to store transformation values if required. Rank Sorter: Sorts the data. It has distinct option which can filter duplicate rows. Lookup: It has input, output, lookup and return port. Explained in next question. Union: It works like union all SQL. Stored Procedure Update Strategy: Treats source row as "data driven" - DD_UPDATE, DD_DELETE, DD_INSERT and DD_REJECT. Normalizer: Takes multiple columns and returns few based on normalization. Passive transformation includes: Expression: It is used to calculate in single row before writing on the target, basically non-aggregate calculations. Sequence Generator: For generating primary keys, surrogate keys (NEXTVAL & CURRVAL). Structure Parser Explain Lookup transformation in detail? Lookup transformation is used to lookup a flat file, relational database table or view. It has basically four ports - input (I), output (O), lookup (L) and return port (R). Lookup transformations can be connected or unconnected and could act as active/passive transformation. Connected lookup: It can return multiple output values. It can have both dynamic and static cache. It can return more than one column value in output port. It caches all lookup columns. Unconnected lookup: It can take multiple input parameters like column1, column2, column3 .. for lookup but output will be just one value. It has static cache. It can return only one column value in output port. It caches lookup condition and lookup output port in return port. Lookup cache can be made cached or no cache. Cached lookup can be static or dynamic. Dynamic cache can basically change during the execution of process, for example if lookup data itself changes during transaction (NewLookupRow). Further, these cache can be made persistent or non-persistent i.e. it tells Informatica to keep lookup cache data or delete it after completion of session. Here is types of cache: Static - static in nature. Dynamic - dynamic in nature. Persistent - keep or delete the lookup cache. Re-cache - makes sure cache is refreshed if underlying table data changes. Shared cache - can be used by other mappings. What are the various types of files created during Informatica session run? Session log file - depending upon tracing level (none, normal, terse, verbose, verbose data) it records SQL commands, reader-writer thread, errors, load summary etc. Note: You can change the number of session log files saved (default is zero) for historic runs. Workflow log file - includes load statistics like number of rows for source/target, table names etc. Informatica server log - created at home directory on Unix box with all status and error messages. Cache file - index or data cache files; for example aggregate cache, lookup cache etc. Output file - based on session if it's creating output data file. Bad/Reject file - contains rejected rows which were not written to target. Tracing levels: None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping. Terse: logs initialization information, error messages, and notification of rejected data in the session log file. Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details; names of index and data files used, and detailed transformation statistics. Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation. What is SQL override in Informatica? SQL override can be implemented at Source Qualifier, Lookups and Target. It allows you to override the existing SQL in mentioned transformations to Limit the incoming rows Escape un-necessary sorting of data (order by multiple columns) to improve performance Use parameters and variables Add WHERE clause What is parallel processing in Informatica? You can implement parallel processing in Informatica by Partitioning sessions. Partitioning allows you to split large amount of data into smaller sets which can be processed in parallel. It uses hardware to its maximum efficiency. These are the types of Partitioning in Informatica: Database Partitioning: Integration service checks if source table has partitions for parallel read. Round-robin Partitioning: Integration service evenly distributes the rows into partitions for parallel processing. Performs well when you don't want to group data. Hash Auto-keys Partitioning: Distributes data based on hash function among partitions. It uses all sorted ports to create partition key. Perform well before sorter, rank and unsorted aggregator. Hash User-keys Partitioning: Distributes data based on hash function among partitions, but here user can choose the port which will act as partition key. Key Range Partitioning: User can choose multiple ports which will act as compound partition key. Pass-through Partitioning: Rows are passed to next partition without redistribution like what we saw in round robin. What can be done to improve performance in Informatica? See the bottlenecks by running different tracing levels (verbose data). Tune the transformations like lookups (caching), source qualifiers, aggregator (use sorted input), filtering un-necessary data, degree of parallelism in workflow etc. Dropping the index before data load and re-indexing (for example using command task at session level) after data load is complete. Change session properties like commit interval, buffer block size, session level Partitioning (explained above), reduce logging details, DTM buffer cache size and auto-memory attributes. OS level tuning like disk I/O & CPU consumption.

  • Data Driven Parenting: An Introduction (Entry 1)

    As a first-time parent, I find myself wondering how each decision I'm making might end up "messing up" my kid. How each factor that I introduce could ripple down and somehow eventually lead to my child sitting on a meticulously upholstered psychiatrist's couch talking about how all their problems stemmed from childhood and were particularly the fault of some defect or distortion in their relationship with their mother (aka me). But I think back to my own childhood: left for long unsupervised lengths times in the car parked outside of a grocery store, freely flipping through mystery/horror/slasher movies with my friends, and eating Hot Pockets and Pop-Tarts for dinner. Am I messed up? I mean, probably a little but aren't we all? There are libraries filled with parenting advice, oftentimes offering contrary opinions. Homo Sapiens have perpetuated for an estimated 200,000 to 300,000 years, how badly could we be doing? These are the concerns that I imagine drove Brown University economist Emily Oster to write Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Penguin Press). I paged through it quickly at my local bookstore and it got me thinking: how much does parenting actually contribute to child outcome? What should we really be doing? Can we actually mess up our kids? Utilizing Oster's compiled research from Cribsheet as a foundation, I'll be exploring what the past and present research states, as well as what findings in animals has also suggested. What does the data on parenting say? Are we just becoming more anxious, more allergic, more obese, hopeless? doomed?? Does anyone really know what they're doing? Hopefully, we'll find out. Join me later for Data Driven Parenting: ??? Entry 2.

  • Ultimate Minimalist Baby Registry: The Bare Minimum Baby

    Whether you’re a expecting a child of your own or you’re gifting to expectant friends/family, here is a list of absolute must-haves for new parents: Minimalist Must-Haves 1. Sleeping receptacle Whether parents plan on co-sleeping with the popular DockATot or opting for the luxury bassinet Snoo, babies need a place to sleep. If you live in an apartment or are just keeping it simple, opt for a Pack 'n Play (Pack & Play mattress sold separately). Unless your baby is in the 99th percentile in length, this can easily serve as their crib until for up to 1 year. Ikea's crib is also a tasteful, safe, and affordable addition to any nursery (mattress sold separately). Depending on which route you choose, you'll need some bedding. I recommend 4 sets of sheets. If you do your laundry once a week, that should give you enough slack for nighttime accidents/leaks. You can buy waterproof sheets but keep in mind that a lot of mattresses are also waterproof. 2. Travel system If you're driving home from the hospital, you won't be released unless you have a car seat. Do not buy your car seat at the thrift store or even "lightly used". This is not the baby item you should be trying to save money on. These also have a lifespan so don't use an old dusty one your aunt fishes out of her storage unit. There are a multitude of companies who manufacture safe car seats. We personally went with a Chicco Travel system, the easy pop-out feature made travelling a breeze. If you opt out of a car seat (maybe you don't own a car and only use cramped public transit systems) then you can also "wear" your baby out in a carrier. Popular options include a sling, wrap, or front pack. There are a ton of different options for baby carriers depending on your own preference so shop around and do your research. 3. Diaper bag You can use pretty much any kind of bag as a diaper bag so just put some thought into what you find most convenient (tote, backpack, messenger, etc). Of course, bags intended as diaper bags have very convenient pockets for organizing so if you plan on using a daycare or just taking your kiddo place to place, a diaper bag is an excellent investment. 4. Baby wipes Here's where it gets complicated. Many babies have sensitive skin and you won't really know if your baby will react to a certain brand of wipe. So I recommend buying a couple small packs of wipes before you go buying a Costco value box. However, if you're the type who likes to have apocalyptic preparations then consider stocking up on sensitive skin wipes such as the popular Water Wipes. 5. Diapers* A note on diapers - unlike wipes, these have sizes which your child will rapidly outgrow. Your baby shower may likely be flooded with value boxes of diapers. Some parents stock up the Newborn size diapers only to find that their baby outgrew them in just a few days. I recommend buying/registering for a small pack of the newborn and a single larger pack of the size 1 fit. I personally like the Honest diapers (cute prints) but if you're on a budget, generic brands at Target or even Aldi's will work just as nice. 6. Clothing A note on clothing - this is probably one of the most popular baby shower gifts and you'll receive a lot of different outfits. Some baby clothing can be soooo adorable but also be an absolute nightmare to get on and off. I recommend have 7 easy, everyday outfits (with mittens for sharp clawed newborns) that you use in rotation. The zippered sleep n play style is particularly easy. Also, take your local climate in consideration before you stock up. Is your 6 month size fleece snowsuit going to be useful when your child is 6 months in July? We were gifted so many cute outfits that our baby just didn't have time to wear. 7. Swaddling/burp cloths/breast-feeding covers As a catch-all for this category, muslin blankets are just absolutely perfect. These are so large and useful, I recommend have 6 as you'll want one in the diaper bag, the bedroom, and the play area at all times. They also make excellent little play mats when you're outside or at a friend's house (aka aren't sure about putting your baby down, carpet looks questionable). The popular Aden + Anais brand has many cute design options and are very well made and durable. 8. Night light You are going to be getting up at all times of the night and if you don't want to have the abrupt shock of turning on your overhead lighting, you'll definitely need a night light. There are so many different designs and brands so you really have so much freedom to choose what you like. 9. Bottles and pacifiers* A note on bottles and pacifiers - if you plan on breast feeding, it may be difficult to convince your child to switch from one or the other. Consult your local hospital's breast feeding consultant more on this matter. However, if you're on maternity leave, you'll eventually have to return to work so you'll need a nice set of bottles. If you hate the idea of washing bottles, you can also opt for disposable pumping bags like Kiinde ones that your baby can directly drink from. When your baby starts solids, you can also pack these pouches with baby food for on-the-go snacks. We use the Comotomo bottles (4), they tumble around a little in the fridge but we love how easy they are to clean. We also use the BIBS pacifiers, they are truly a lifesaver and other parents at our daycare have asked about them and switched. 10. Boppy The Boppy newborn lounger is just a super easy place to set your baby down, comfy and convenient. The original Boppy is great for breast-feeding and for propping up baby while they learn to sit-up. If you want just one, I have to say that I'd go with the newborn lounger. It was so great and she loved napping in it so much. And its also pretty comfy to use as a pillow for adults if you want to steal it for a bit. 11. Changing pad There's a good amount of accidents that can occur while you're changing your baby and a waterproof changing pad is a good investment. Also, our little one loves to sleep on her Ikea changing pad for some reason. She absolutely hated being in the Snuggle Me Organic but she'd relax and doze off immediately on her changing pad. Kids are strange. 12. Baby bathtub There's nothing more anxiety inducing than giving your baby a bath for the first time so do yourself a favor and get a simple baby bathtub. 13. Towels and washcloths Getting 3-4 towels and 8-10 washcloths is more than enough since you won't be bathing baby everyday. The towels and washcloths can also play double duty as burp clothes or to clean up messes, especially when they start on solid foods. 14. Toiletries A baby wash, sunscreen, and baby ointment are the bare necessities. Due to allergies, you may not want to stock up on these before you figure out if your little one will have a reaction. Our hospital recommended Johnson's baby shampoo so we've used that and we swear by Aquaphor Healing Ointment for diaper rash and just skin irritation. 15. Baby first aid and grooming kit You can register for a kit or compile your own. The first thing I would add is the number for Poison Control (800-222-1222). The second thing you'll need is a baby thermometer so you can check for fevers. You'll also need nail clippers/grinders, a hairbrush, and snot removal device (like the snot-sucker). Consult your doctor for the use of baby pain management and fever reducers (acetaminophen etc). Indulgent Nice-To-Haves 16. Changing table/station 17. Bathtub kneeler 18. Baby swing/bouncer/rocking chair 19. Baby white noise machine 20. Diaper bin 21. Baby monitor 22. Books 23. Toys There are so many things that will depend on your baby's temperament, likes, and dislikes so if at all possible - REGISTER FOR GIFT CARDS. Of course this request depends on your baby shower guest's temperament, likes, and dislikes. But this way, you can react more flexibly instead of having crippling buyer's remorse for that one expensive baby swing that your child cries at the sight of. A cluttered house filled with baby gear can be stressful and anxiety-inducing, especially if the baby doesn't like to use any of it. Good luck and best wishes! #minimalist #parenting #baby #babyregistry #babyregistrylist #babymusthaves #newborn #firsttimeparent #minimalism #minimalistparenting #nursery #apartment #decluttering

  • Facts you should know about Food Waste & "Too Good To Go" Application

    Food waste is a major problem now-a-days in developed countries like America and Europe. Majority of the food waste comes from restaurants and other retail distributors where quality of product is first priority. Some alarming facts about Food wastage which you should know: Roughly one-third of the food produced in the world is wasted every year (approximately 1.3 billion tons). Every year, consumers in rich countries waste almost 222 million tons of food whereas the entire net food production of Sub-Saharan Africa is 230 million tons approximately. Per capita waste by consumers is between 95 to 115 kg per year in Europe and North America, whereas it is only 6-11 kg per year in Sub-Saharan Africa, south and Southeastern Asia. In all over Europe, there is a trend in food sector to throw the remaining fresh food items to uphold the quality of food and hygiene standards. Such practice undoubtedly brings high-value customer satisfaction, trust and is highly commendable but it also causes wastage of food in larger quantity. Certainly developed countries like Europe and USA can afford such "arrogance" but in today’s era of sustainable development goals (SDG) and carbon neutrality, is it not a white-collar crime to keep following such practice in the name of quality maintenance and customer satisfaction? We live in a world where more than 1 billion people suffer from hunger and in every 5 seconds, a child dies because of hunger or of directly related causes. Certainly, the condition is different in developed nations where the priority is more on Quality of food other than its utilization but this ignorance of reality cannot support such activity. Various steps have been taken to reduce the wastage of food. One of such step is the "Too Good To Go" mobile app which is in association with various restaurants of Europe preventing the food wastage by selling it to desired consumer at highly discount rate. It allows both, the consumers and restaurants to prevent the food items from going to bin and using it judiciously. Thus, it saves a large quantity of food from going to the bin and in turn generating a profit out of it. I have been traveling within Europe for past two years and was always perturbed from this act of throwing good quality fresh food to waste bin instead of selling it in lower price to ensure its full utilization. I was not aware of such mobile application, which significantly reduces food wastage and in turn makes the world a better place to live. Nevertheless, such apps should be promoted to ensure not a single grain of food should ever go wasted again.

bottom of page