top of page

Results found for empty search

  • Data Driven Parenting: An Introduction (Entry 1)

    As a first-time parent, I find myself wondering how each decision I'm making might end up "messing up" my kid. How each factor that I introduce could ripple down and somehow eventually lead to my child sitting on a meticulously upholstered psychiatrist's couch talking about how all their problems stemmed from childhood and were particularly the fault of some defect or distortion in their relationship with their mother (aka me). But I think back to my own childhood: left for long unsupervised lengths times in the car parked outside of a grocery store, freely flipping through mystery/horror/slasher movies with my friends, and eating Hot Pockets and Pop-Tarts for dinner. Am I messed up? I mean, probably a little but aren't we all? There are libraries filled with parenting advice, oftentimes offering contrary opinions. Homo Sapiens have perpetuated for an estimated 200,000 to 300,000 years, how badly could we be doing? These are the concerns that I imagine drove Brown University economist Emily Oster to write Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Penguin Press). I paged through it quickly at my local bookstore and it got me thinking: how much does parenting actually contribute to child outcome? What should we really be doing? Can we actually mess up our kids? Utilizing Oster's compiled research from Cribsheet as a foundation, I'll be exploring what the past and present research states, as well as what findings in animals has also suggested. What does the data on parenting say? Are we just becoming more anxious, more allergic, more obese, hopeless? doomed?? Does anyone really know what they're doing? Hopefully, we'll find out. Join me later for Data Driven Parenting: ??? Entry 2.

  • Frequently asked Informatica interview questions and answers 2019

    What is Informatica and Enterprise Data Warehouse? Informatica is an ETL tool which is used to extract, transform and load data. It plays crucial role in building Enterprise Data Warehouse. EDW provides user a global view on historical data stored from various department in an organization like finance, sales, labor, marketing etc based on which critical business decisions could be made. What is repository service, integration service, reporting service, domain and node? There are basically three types of services which you will find under application services: Repository service - It is responsible for storing Informatica metadata (source definition, target definition, mappings, mapplets and transformations in the form of tables) and providing access to these repositories for other services. It is also responsible for maintaining connection between client development tool and Informatica repository which keeps metadata up-to-date by updating, inserting and deleting the metadata. Integration Service - It is responsible for execution of workflow and end-to-end data movement from source to target. It pulls the workflow information from repository, starts the execution of tasks, gathers/combines data from different sources (for example relational database and flat file) and loads into single or multiple targets. Reporting Service - It enables report generation. Domain - It is administrative part of Informatica which is used by admins to operate nodes and services. You can define multiple nodes in a domain with a gateway node which is responsible for receiving request and distributing it to worker nodes. Further, nodes are responsible for running services and other Informatica processes. There are basically two types of services in a domain - Service manager and Application services. Service manager is responsible for logging and login operations. Like authentication, authorization, managing people and groups. Application services are responsible for managing integration services, repository services and reporting services. Key properties of domain are as follows: Database properties - You can define database instance name and port responsible for holding domain. It consist of database type (like Oracle, SQL server etc), host, port, database name and user name. General Properties - You can define resilience timeout, restart period, restart attempts, dispatch mode etc. For example if services goes down how many seconds application services wait to again connect to respective service depends upon resilience timeout, how long Informatica can try restarting those services depends upon restart period and attempts. How tasks will be distributed to worker nodes from gateway will depend upon dispatch mode like round robin. What is PowerCenter repository? It is like a relational database which stores Informatica metadata in the form of tables (underlying database could be Oracle database or SQL server or similar) and it is managed by repository services. What is Informatica client tool? Informatica client tool is basically developer tool installed on client machine and it consist of four parts: Designer (Source, target, mapping, mapplet and transformations designer) Workflow manager (Task, worklet and workflow designer) Monitor Repository manager Basic terminology consist of: Workflow, Worklet, Sessions & Tasks: Workflow consists of one or more session, worklet and task (includes timer, decision, command, event wait, mail, link, assignment, control etc) connected in parallel or sequence. You can run these sessions by using session manager or pmcmd command. Further, you can write pre-post-session commands. Mappings, Mapplets & Transformations: Mappings are collection of source, target and transformations. Mapplets (designed in mapplet designer) are like re-usable mappings which contains various transformations but no source/target. You can run the mapping in debug mode without creating a session. There are mapping parameters and variables as well. Mapping parameters represent constant values that are defined before running a session while mapping variables can change values during sessions. What could be the various states of object in Informatica? Valid - fully syntactically correct object. Invalid - where syntax and properties are invalid according to Informatica standards. Informatica marks those objects invalid. It could be mapping, mapplet, task, session, workflow or any transformation. Impacted - where underlying object is invalid. For instance in a mapping suppose underlying transformation has become invalid due to some change. What happens when user executes the workflow? User executes workflow Informatica invokes integration service to pull workflow details from repository Integration service starts execution of workflow after gathering workflow metadata Integration service runs all child tasks Reads and combine data from sources and loads into target After execution, it updates the status of task as succeeded, failed, unknown or aborted Workflow and session log is generated What is the difference between ABORT and STOP? Abort will kill process after 60 seconds even if data processing hasn't finished. Session will be forcefully terminated and DTM (Data Transformation Manager) process will be killed. Stop will end the process once DTM processing has finished processing. Although it stops reading data from source as soon as stop command is received. What are the various types of transformation available in Informatica? There are basically two categories of transformation in Informatica. Active - It can change the number of rows that pass through transformation for example filter, sorter transformations. It can also change the row type for example update strategy transformation which can mark row for update, insert, reject or delete. Further it can also change the transaction boundary for example transaction control transformation which can allow commit and rollback for each row based on certain expression evaluation. Passive - Maintains same number of rows, no change in row type and transaction boundary. Number of output rows will remain always same as number of input rows. Active transformation includes: Source Qualifier: Connected to relational source or flat file, converts source data type to Informatica native data types. Performs joins, filter, sort, distinct and you can write custom SQL. You can have multiple source qualifier in single session with multiple targets, in this case you can decide target load order as well. Joiner: It can join two heterogeneous sources unlike source qualifier which needs common source. It performs normal join, master outer join, detail outer join and full outer join. Filter: It has single condition - drops records based on filter criteria like SQL where clause, single input and single output. Router: It has input, output and default group - acts like filter but you can apply multiple conditions for each group like SQL case statement, single input multiple output, more easier and efficient to work as compared to filter. Aggregator: It performs calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups. It also provides extra cache files to store transformation values if required. Rank Sorter: Sorts the data. It has distinct option which can filter duplicate rows. Lookup: It has input, output, lookup and return port. Explained in next question. Union: It works like union all SQL. Stored Procedure Update Strategy: Treats source row as "data driven" - DD_UPDATE, DD_DELETE, DD_INSERT and DD_REJECT. Normalizer: Takes multiple columns and returns few based on normalization. Passive transformation includes: Expression: It is used to calculate in single row before writing on the target, basically non-aggregate calculations. Sequence Generator: For generating primary keys, surrogate keys (NEXTVAL & CURRVAL). Structure Parser Explain Lookup transformation in detail? Lookup transformation is used to lookup a flat file, relational database table or view. It has basically four ports - input (I), output (O), lookup (L) and return port (R). Lookup transformations can be connected or unconnected and could act as active/passive transformation. Connected lookup: It can return multiple output values. It can have both dynamic and static cache. It can return more than one column value in output port. It caches all lookup columns. Unconnected lookup: It can take multiple input parameters like column1, column2, column3 .. for lookup but output will be just one value. It has static cache. It can return only one column value in output port. It caches lookup condition and lookup output port in return port. Lookup cache can be made cached or no cache. Cached lookup can be static or dynamic. Dynamic cache can basically change during the execution of process, for example if lookup data itself changes during transaction (NewLookupRow). Further, these cache can be made persistent or non-persistent i.e. it tells Informatica to keep lookup cache data or delete it after completion of session. Here is types of cache: Static - static in nature. Dynamic - dynamic in nature. Persistent - keep or delete the lookup cache. Re-cache - makes sure cache is refreshed if underlying table data changes. Shared cache - can be used by other mappings. What are the various types of files created during Informatica session run? Session log file - depending upon tracing level (none, normal, terse, verbose, verbose data) it records SQL commands, reader-writer thread, errors, load summary etc. Note: You can change the number of session log files saved (default is zero) for historic runs. Workflow log file - includes load statistics like number of rows for source/target, table names etc. Informatica server log - created at home directory on Unix box with all status and error messages. Cache file - index or data cache files; for example aggregate cache, lookup cache etc. Output file - based on session if it's creating output data file. Bad/Reject file - contains rejected rows which were not written to target. Tracing levels: None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping. Terse: logs initialization information, error messages, and notification of rejected data in the session log file. Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details; names of index and data files used, and detailed transformation statistics. Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation. What is SQL override in Informatica? SQL override can be implemented at Source Qualifier, Lookups and Target. It allows you to override the existing SQL in mentioned transformations to Limit the incoming rows Escape un-necessary sorting of data (order by multiple columns) to improve performance Use parameters and variables Add WHERE clause What is parallel processing in Informatica? You can implement parallel processing in Informatica by Partitioning sessions. Partitioning allows you to split large amount of data into smaller sets which can be processed in parallel. It uses hardware to its maximum efficiency. These are the types of Partitioning in Informatica: Database Partitioning: Integration service checks if source table has partitions for parallel read. Round-robin Partitioning: Integration service evenly distributes the rows into partitions for parallel processing. Performs well when you don't want to group data. Hash Auto-keys Partitioning: Distributes data based on hash function among partitions. It uses all sorted ports to create partition key. Perform well before sorter, rank and unsorted aggregator. Hash User-keys Partitioning: Distributes data based on hash function among partitions, but here user can choose the port which will act as partition key. Key Range Partitioning: User can choose multiple ports which will act as compound partition key. Pass-through Partitioning: Rows are passed to next partition without redistribution like what we saw in round robin. What can be done to improve performance in Informatica? See the bottlenecks by running different tracing levels (verbose data). Tune the transformations like lookups (caching), source qualifiers, aggregator (use sorted input), filtering un-necessary data, degree of parallelism in workflow etc. Dropping the index before data load and re-indexing (for example using command task at session level) after data load is complete. Change session properties like commit interval, buffer block size, session level Partitioning (explained above), reduce logging details, DTM buffer cache size and auto-memory attributes. OS level tuning like disk I/O & CPU consumption.

  • $74 Billion in Lost Bitcoin - Modern Day Buried Treasure

    In 2018, the digital forensics firm Chainalysis has estimated that around 35% of all bitcoin is likely lost. This is approximately 7.4 million of the 21 million bitcoin that will ever exist. Which this month (August 2019) has been averaging at a market value of $10K per bitcoin. Extrapolating, this is over $74 billion worth of bitcoin. Losing Bitcoin To be more exact, the bitcoin aren't lost but frozen. Bitcoin are inaccessible to their owners without a private key, a 256-bit number, usually saved in a wallet file. It is these keys, stored on hard drives or flash drives, that people misplace/lose/throw away/erase. One of the most infamous stories involves James Howell, a British man who owned upwards of 7,500 bitcoins. Howell accidentally threw out the hard drive containing his wallet file, thus losing all access to his Bitcoin. At the time he lost it, the digital currency were valued at $7.5 million. The fortune is currently buried at his local landfill in Newport, South Wales. And, despite his desire to search, his local city council officials won't allow him due to safety concerns. These 7,500 bitcoins are now estimated to be worth $75 million. Similarly, Campbell Simpson threw away a hard drive containing 1,400 bitcoin, currently valued to be worth $14 million. Even Elon Musk seems to have lost his bitcoin. To be clear, we aren't discussing how people lose bitcoin through careless/frivolous spending or outright theft. Those bitcoin have just been transacted and are under different ownership but they are still "active" coins. The bitcoin we're discussing has become forsaken, abandoned and inaccessible, buried forever in an online vault. Forgotten pins, key holders who have passed away, overwritten USB sticks; there are countless ways people have lost their bitcoin but what they have in common is lost passwords/keys. Bitcoin Protection Your bitcoin is protected with a 256-bit key which has 2^256 possible combinations or 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible combinations. It's been estimated that even the fastest supercomputer in the world (Tianhe-2) would take millions of years to crack 256-bit encryption. Bitcoin Recovery The US Treasury Department replaces and redeems an estimated $30 million in mutilated, damaged, and burned currency every year. However, there is no one that can really help you if you lose your bitcoin password since there's no way to reset the password. Besides dumpster diving, some desperate investors have even resorted to hypnosis in an attempt to conjure up their keys. However, one company offers a glimmer of hope to those who lost their keys. Wallet Recovery Services, managed by Dave Bitcoin (an alias), claims a 30% success rate in hacking passwords by brute force decryption and charges his clients 20% of the amount in the wallet. Good luck to all you investors and keep your wallets safe~

  • Apache Kafka Overview (Windows)

    Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache. Why Use Kafka? Multiple producers Multiple consumers Disk based persistence Highly scalable High performance Offline messaging Messaging replay Kafka Use Cases 1. Enterprise messaging system Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need. Suitable for both online and offline messaging consumer system. 2. Message Store with playback capability Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration. Each message is backed up with distributed file system. Supports the storage size for 50K to 50 TB. 3. Stream processing Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window. Download and Install Kafka Kafka requires below JRE and Zookeeper. Download and Install the below components. JRE : http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html ZooKeeper : http://zookeeper.apache.org/releases.html Kafka : http://kafka.apache.org/downloads.html Installation (on Windows) 1. JDK Setup Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened. Edit the PATH variable and append “;%JAVA_HOME%\bin” To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed 2. Zookeeper Installation: Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf) Rename file "zoo_sample.cfg" to "zoo.cfg". Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data. Add entry in System Environment Variables as we did for Java. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10 Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin; You can change the default Zookeeper port in zoo.cfg file (Default port 2181). Run Zookeeper by opening a new cmd and type zkserver. 3. Kafka Setup: Go to your Kafka config directory. For me its C:\kafka_2.10-0.10.2.0\config. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-0.10.2.0\kafka-logs". If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181. Testing Kafka Creating Topics Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup). If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system. Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows and type following command and hit enter. kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic Creating a Producer · Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows. · To start a producer type the following command: kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic Start Consumer · Again open a new command prompt in the same location as C:\kafka_2.11-0.9.0.0\bin\windows · Now start a consumer by typing the following command: kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic Now you will have two command window Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt Some Other Useful Kafka Commands List Topics: kafka-topics.bat --list --zookeeper localhost:2181 Describe Topic kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name] Read messages from beginning: kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning Delete Topic kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181 Kafka Architecture Kafka system has below main component, which are co-ordinated by Zookeeper. Topic Broker Producers Consumers 1. Topic Can be considered like a folder in a file system Producers published the message to a topic Message is appended to the topic. Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number. For each topic, the Kafka cluster maintains a partitioned log. Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance. Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". Kafka provides ordering of message per partition but not across the partition. 2. Broker Core component of Kafka messaging system. Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper. Kafka cluster consists of one or more broker. Maintains the replication of partition across the cluster. 3. Producers Publishes the message to a topic(s). Messages are appended to one of the topic. It is one of the user of the Kafka cluster Kafka maintains the ordering of the message per partition but not the across the partition. 4. Consumers Subscriber of the messages from a topic One or more consumer can subscriber a topic from different partition, called consumer group. Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition. Each consumer maintains the offset for subscribing partition. A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic 5. Message Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key. A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number) Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network. Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput. Visit this link for Apache Kafka Producer with Example using java If you have any question please mention in comments section below. Thank you. #KafkaOverview #ApacheKafkaWindows #KafkaZookeeperInstallation #KafkaUseCases #KafkaCommands [09/07/2019 5:49 PM CST - Reviewed by: PriSin]

  • Kafka Producer and Consumer example (in Java)

    In this Kafka pub sub example you will learn, Kafka producer components (producer api, serializer and partition strategy) Kafka producer architecture Kafka producer send method (fire and forget, sync and async types) Kafka producer config (connection properties) example Kafka producer example Kafka consumer example Prerequisite - refer my previous post on Apache Kafka Overview (Windows). Apache Kafka is one of the client for Kafka broker. It publishes message to kafka topic. Messages are serialized before transferring it over the network. Kafka Producer Components Producers API Kafka provides a collection of producer APIs to publish the message to the topic. The messaging can be optimize by setting different parameters. Developer can decide to publish the message on a particular partition by providing custom key for the message. Serializer Serializer serializes the message to pass over the network. Default or custom serializer can be set by developer to serialize the message. Below are the String Serializer. value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer The serialize are set for the value for the key and value both. Partitioner This component apply the hashing algorithm and finds the partition for the message, if keys are provided. If Key for the message is not provided by developer then it uses the round-robin algorithm to assign the the topic for the message. Kafka Producer Send Methods Fire and Forget Producer does not care for the message arrives at destination or not. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data); Synchronous Send Send() method returns future object and developer can use the get() method on future object to know the status of message. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data).get(); Asynchronous Send Developers can use the send() with a callback function which is called once broker send the response back to the producer. TestCallback callback = new TestCallback(); ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data, callback); private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } Producer Configuration bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Kafka Producer Example Step-1: Start Zookeeper Step-2: Start Kafka broker and create a topic TEST.TOPIC Step-3: Create a Java project Step-4: Create a properties file - kconnection.properties bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 TOPIC_NAME=TEST.TOPIC batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Step-5: KafkaConnection.java package com.demo.twitter.util; import java.io.InputStream; import java.util.HashMap; import java.util.Map; import java.util.Properties; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; public class KafkaConnection { static Properties props=null; private static Properties loadConPropsFromClasspath() throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream("kconnection.properties"); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public static Producer getKafkaConnection()throws Exception{ Properties props=loadConPropsFromClasspath(); Producer producer = new KafkaProducer(props); return producer; } public static String getTopicName() throws Exception{ if(props!=null){ return props.getProperty(IKafkaSourceConstant.TOPIC_NAME); }else{ return null; } } } Step-6: KafkaProducerClient.java package com.demo.client.producer; import org.apache.kafka.clients.producer.Callback; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import com.demo.twitter.util.KafkaConnection; public class KafkaProducerClient { static KafkaProducer producer=null; /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { try{ KafkaProducerClient pclient=new KafkaProducerClient(); long i = 1; for (; i <=10 ; i++) { KafkaProducerClient.sendMessage(""+i, "Hello This is test message ..Demo"+i); } System.out.println("Number of message sent "+(i-1)); pclient.closeProducer(); }catch(Exception e){ e.printStackTrace(); } } public static void sendMessage(String key,String message)throws Exception{ try{ if(producer==null){ producer =(KafkaProducer) KafkaConnection.getKafkaConnection(); System.out.println("Kafka Connection created for topic.. demo"+KafkaConnection.getTopicName()); } TestCallback callback = new TestCallback(); long startTime=System.currentTimeMillis(); ProducerRecord data = new ProducerRecord(KafkaConnection.getTopicName(), key, message ); producer.send(data, callback); System.out.println("Total Time:---> "+Long.valueOf(System.currentTimeMillis()-startTime)); }catch(Exception e){ e.printStackTrace(); producer.close(); } } public void closeProducer(){ try{ producer.close(); }catch(Exception e){ e.printStackTrace(); } } private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } } Apache Kafka Consumer Example Continue in the same project. Step-1: Create a properties file: kconsumer.properties with below contents bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 TOPIC_NAME=TEST.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 Step-2: Create KafkaConsumerClient.java package com.demo.kafka.consumer; import java.io.InputStream; import java.util.Collections; import java.util.Properties; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import com.demo.twitter.util.KafkaConnection; public class KafkaConsumerClient { Properties props=null; KafkaConsumer consumer =null; public static void main(String[] args) { KafkaConsumerClient conClient=new KafkaConsumerClient(); try { conClient.subscribeMessage("kconsumer.properties"); } catch (Exception e) { e.printStackTrace(); } } public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ //Common for below two approach if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } public KafkaConsumer getKafkaConnection(String fileName)throws Exception{ if(props==null){ props=loadConPropsFromClasspath(fileName); System.out.println(props); } KafkaConsumer consumer = new KafkaConsumer(props); return consumer; } private Properties loadConPropsFromClasspath(String fileName) throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream(fileName); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public String getTopicName() throws Exception{ if(props!=null){ return props.getProperty("TOPIC_NAME"); }else{ return null; } } } Thank you. If you have any question please write in comments section below. [09/09/2019 10:38 PM CST - Reviewed by: PriSin]

  • Apache Avro Schema Example (in Java)

    Introduction Avro provides data serialization based on JSON Schema. It is language neutral data serialization system, means a language A can serialize and languages B can de-serialize and use it. Avro supports both dynamic and static types as per requirement. It supports many languages like Java,C, C++, C#, Python and Ruby. Benefits Producers and consumers are decoupled from their change in application. Schemas help future proof your data and make it more robust. Supports and used in all use cases in streaming specially in Kafka. Avro are compact and fast for streaming. Supports for schema registry in case of Kafka. Steps to Serialize Object Create JSON schema. Compile the schema in the application. Populate the schema with data. Serialize data using Avro serializer. Steps to Deserialize Object Use Apache Avro api to read the serialized file. Populate the schema from file. Use the object for application. Sample Example for Avro (in Java) Step-1: Create a Java project and add the dependencies as below. Step-2: Create a Schema file as below: Customer_v0.avsc { "namespace": "com.demo.avro", "type": "record", "name": "Customer", "fields": [ { "name": "id", "type": "int" }, { "name": "name", "type": "string" }, { "name": "faxNumber", "type": [ "null", "string" ], "default": "null" } ] } Step-3: Compile the schema. java -jar lib\avro-tools-1.8.1.jar compile schema schema\Customer_v0.avsc schema Step-4: Put the java generated file to the source directory of the project as shown in project structure. Step-5: Create the Producer.java package com.demo.producer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileWriter; import org.apache.avro.io.DatumWriter; import org.apache.avro.specific.SpecificDatumWriter; import com.demo.avro.Customer; public class Producer { public static void main(String[] args)throws IOException { serailizeMessage(); } public static void serailizeMessage()throws IOException{ DatumWriter datumWriter = new SpecificDatumWriter(Customer.class); DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); File file = new File("customer.avro"); Customer customer=new Customer(); dataFileWriter.create(customer.getSchema(), file); customer.setId(1001); customer.setName("Customer -1"); customer.setFaxNumber("284747384343333".subSequence(0, 10)); dataFileWriter.append(customer); customer=new Customer(); customer.setId(1002); customer.setName("Customer -2"); customer.setFaxNumber("45454747384343333".subSequence(0, 10)); dataFileWriter.append(customer); dataFileWriter.close(); } } Step-6: Create the Consumer.java package com.demo.consumer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileReader; import org.apache.avro.io.DatumReader; import org.apache.avro.specific.SpecificDatumReader; import com.demo.avro.Customer; public class Consumer { public static void main(String[] args)throws IOException { deSerailizeMessage(); } public static void deSerailizeMessage()throws IOException{ File file = new File("customer.avro"); DatumReader datumReader = new SpecificDatumReader(Customer.class); DataFileReader dataFileReader= new DataFileReader(file,datumReader); Customer customer=null; while(dataFileReader.hasNext()){ customer=dataFileReader.next(customer); System.out.println(customer); } } } Step-7: Run Producer.java It creates customer.avro file and puts the customer in Avro format. Step-8: Run Consumer.java It reads the customer.avro file and get the customer records. Thank you! If you have any question please mention in comments section below. [12/09/2019 10:38 PM CST - Reviewed by: PriSin]

  • Kafka Consumer Advance (Java example)

    Prerequisite Kafka Overview Kafka Producer & Consumer Commits and Offset in Kafka Consumer Once client commits the message, Kafka marks the message "deleted" for the consumer and hence the read message would be available in next poll by the client. Properties used in the below example bootstrap.servers=localhost:9092 ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 HQ_TOPIC_NAME=EK.TWEETS.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 auto.offset.reset=earliest enable.auto.commit=false Configuration Level Setting This can be done at configuration level in the properties files. auto.commit.offset=false - This is the default setting. Means the consumer API can take the decision to retail the message of the offset or commit it. auto.commit.offset=true - Once the message is consumed by the consumer, the offset is committed if consumer API is not taking any decision in client code. Consumer API Level Setting Synchronous Commit Offset is committed as soon consumer API confirms. The latest Offset of the message is committed. Below example is committing the message after processing all messages of the current polling. Synchronous commit blocks until the broker responds to the commit request. Sample Code public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(10000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Asynchronous Commit The consumer does not wait for the the response from the broker This commits just confirms the broker and continue its processing. Throughput is more in compare to Synchronous commit. There could be chances of duplicate read, that application need to handle its own. Sample code while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of messaged polled by consumer "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitAsync(new OffsetCommitCallback() { public void onComplete(Map offsets, Exception exception) { if (exception != null){ System.out.printf("Commit failed for offsets {}", offsets, exception); }else{ System.out.println("Messages are Committed Asynchronously..."); } }}); } Offset Level Commit Sometime application may need to commit the offset on read of particular offset. Sample Code Map currentOffsets =new HashMap(); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); if(record.offset()==18098){ consumer.commitAsync(currentOffsets, null); } } } Retention of Message Kafka retains the message till the retention period defined in the configuration. It can be defined at broker level or at topic level. Retention of message can be on time basis or byte basis for the topic. Retention defined on Topic level override the retention defined at broker level. retention.bytes - The amount of messages, in bytes, to retain for this topic. retention.ms - How long messages should be retained for this topic, in milliseconds. 1. Defining retention at topic level Retention for the topic named “test-topic” to 1 hour (3,600,000 ms): # kafka-configs.sh --zookeeper localhost:2181/kafka-cluster --alter --entity-type topics --entity-name test-topic --add-config retention.ms=3600000 2. Defining retention at broker level Define one of the below properties in server.properties # Configures retention time in milliseconds => log.retention.ms=1680000 # Configures retention time in minutes => log.retention.minutes=1680 # Configures retention time in hours => log.retention.hours=168 Fetching Message From A Specific Offset Consumer can go down before committing the message and subsequently there can be message loss. Since Kafka broker has capability to retain the message for long time. Consumer can point to specific offset to get the message. Consumer can go back from current offset to particular offset or can start polling the message from beginning. Sample Code Map currentOffsets =new HashMap(); public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } TopicPartition topicPartition = new TopicPartition(getTopicName(), 0); List topics = Arrays.asList(topicPartition); consumer.assign(topics); consumer.seekToEnd(topics); long current = consumer.position(topicPartition); consumer.seek(topicPartition, current-10); System.out.println("Topic partitions are "+consumer.assignment()); while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of record polled "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); } consumer.commitAsync(currentOffsets, null); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Thank you. If you have any doubt please feel free to post your questions in comments section below. [23/09/2019 04:38 PM CST - Reviewed by: PriSin]

  • Installing Spark on Windows (pyspark)

    Prerequisite: Follow these steps to install Apache Spark on windows machine. Now-a-days Python is used by many applications. So it is quite possible that Python is already available on your machine. To check, just run this command on your command prompt. C:\Users\rajar> python --version 'python' is not recognized as an internal or external command, operable program or batch file. If Python is present on your computer, command will output the Python version like this. Python x.x.x Check if Java is properly installed, just run java -version and you should be able to see Java version running on your computer. Download & Install Python Go to Python download page and download the latest version (don't download Python 2). Download 64 bit or 32 bit installer depending upon your system configuration. Double click on the downloaded executable file. Don't forget to check the box - Add Python 3.7 to PATH , then click Install now. Thats all, it will take couple of minutes to complete the installation. Now test it, run previous command again and you should be able to see Python version this time. C:\Users\rajar> python --version Python 3.7.4 Run pyspark Now, run the command pyspark and you should be able to see the Spark version. If you have any question please mention in comments section below and I will help you out with installation process. Thank you. Next: Just enough Scala for Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • 6 Budget-Friendly Ways to Prepare for Your Pregnancy (checklist)

    Every pregnancy is different, and that is true even in the same person. Your first pregnancy might have been plagued with morning sickness, high blood pressure and lower back pain, while in your second pregnancy you hardly felt a thing. That can make pregnancy preparation tricky — not knowing what to expect can be hard on your mood and your finances. Many pregnant women enjoy feeling their new child growing and developing, but in those times of discomfort, it’s important to have a plan to manage physical and mental stress. Here are a few budget-friendly tips to help you with sound and solid pregnancy prep. 1. Before and after clothes When you think about buying maternity clothes do you just cringe at the cost, knowing you’ll only have to wear this size for a short period of time? There are actually ways to cut costs when it comes to pregnancy wear. First, consider buying a belly band so you can transform the pants you currently wear into pregnancy and postpartum pants. Second, look into comfortable nursing pajamas (you can find a pair for $33.99) that you can fit into now and after the baby comes. The more cozy and flowy they are, the more comfortable you’ll be during some of those long, late night nursing marathons. 2. Amazon’s “Subscribe and Save” You should have bought stock in antacids with the kind of heartburn you are experiencing. Now it’s 3 am and you can’t sleep and you are out of Tums. You can save time and money by subscribing to items you use a lot. Not only will these be automatically delivered to your home so you never have to experience late night heartburn unaided again, but the cost per item is often reduced when you subscribe. You can do this with other items, like foods you have been craving, shea butter to help reduce stretch marks or hemorrhoid cream for sore bottoms. 3. Putting together a nursery Putting together a warm and comfortable nursery is important for mother and baby. Since you and your newborn will spend a lot of time there, you want it to be as nurturing as possible. And while you might be tempted to go overboard with the decor, it’s important to focus only on the basics so you can stay within budget. Also, while you might be tempted to do everything yourself, don’t tackle any projects that you feel are out of your wheelhouse. Fortunately, in Minneapolis, you can hire a handyman for an assortment of small jobs for an average of $403 per project, depending on the size of the project. And although that might sound like a lot of money, you’ll rest assured knowing that the tasks were completed by a professional. 4. Children’s consignment stores While primarily an ideal spot to find good deals on gently used clothes, toys, furniture and bedding, you can also find steep discounts on used maternity and postpartum accessories. You can find breast pumps and parts, breastfeeding pillows and other nursing items. And the e-commerce boom has also helped increase access to quality used pre- and postpartum clothes. You can even rent high end used maternity and nursing clothes. Browse online and have them delivered right to your door. 5. Explore Coupons and Groupons The big box retailers love a pregnant woman — families are very profitable to stores that sell food, clothing, home goods and furniture. They will be looking to entice you into the store by offering coupons and discounts on maternity and baby items. Take advantage of these discounts! And don’t just look there; websites that offers discounts, like Groupon, also often have a section with items to help you plan and prepare for a baby. And don’t forget about stores like Sam’s Club and Costco. After you pay their membership fee, you get access to bulk and wholesale items with steep discounts. In fact, consider adding a membership to one of those stores to your baby registry. 6. Facebook groups for new moms Social media is a place where we can build community. Of course, anyone watching the news knows social media has a dark side, but there are also opportunities to find and make real connections. Look for mom groups out there in your area. There are often breastfeeding groups, buy-sale-trade groups, baby-wearing groups and other mom-themed groups in many cities. More importantly than being able to purchase used items, you are able to ask questions, get advice and provide — and receive— support. Pregnancy is going to be a time of discovery, even for those on their second child or beyond. Give yourself space to breathe easier by setting a budget and staying within that budget. And don’t forget to lean on your community as much as you can for support.

  • How Families Can Keep Their Home Ready For Showings

    How Families Can Keep Their Home Ready For Showings Getting your house ready to go up on the market is always a big job. Add kids to the picture, however, and that job gets much bigger. You want to make a great presentation, which means keeping up with chaos that can go hand-in-hand with kids. Here’s a look at how families can keep their home looking great until the right buyers come along: Have A Plan It’s important to get everyone in the house on the same page when it comes to staging your home. This includes any children old enough to take on some tidying tasks. For the time your house is up for sale, upgrade everyone’s chore charts to reflect a few items off your staging checklist. This way you’re constantly keeping your home ready for buyers. This is super useful, since it allows you to host agents and house hunters at the drop of a hat. For practical purposes, keep the checklist handy and make sure everyone knows where it is. You can even share your list electronically with other adults in the household and kids who are old enough to use phones or tablets. If your agent lets you know they’re swinging by soon, anyone old enough can ensure that surfaces are wiped, shades are open, and personal items are safely stashed away. Use The Best Tools Keeping your house tidy all the time is a big task, but it can be made substantially easier with the right tools. For example, a good set of microfiber cloths can make quickly wiping up surfaces a breeze. A stick vacuum is another tool you’ll want at your disposal. Since these are more versatile and lighter than traditional vacuums, they make spot cleaning fast and easy. Go through all of your cleaning supplies and try to identify which are most useful for a quick, efficient clean up, and assemble a cleaning caddy so you can grab everything at once when you’re on the run. Remove Personal Items One of the most important things your family can do when it comes to staging your home is taking down décor that makes it look too lived in. Per Creative Home Stagers, this includes family photos, bold color schemes, and especially stylistic wall art or furniture. These personal touches may make you feel at home, but they’ll make potential buyers feel like they’re in someone else’s home. On one hand, this is true, but on the other, it can be a problem. Even if sellers aren’t thinking of it consciously, they’re trying to picture themselves in the space. Pictures of your family holiday party or child’s first steps will make it harder for them to imagine their life in the house. Plan Fun Outings – And A Backup – For Open Houses Although it may be tempting to try and scope out interested buyers, sellers should never be at an open house. In addition to being an even starker reminder that the home belongs to someone else, The Balance points out that your presence will put uncomfortable pressure on the buyers and make it harder for them to pay attention to the property. Instead, plan a fun outing with your family during the scheduled open houses. Head to a park, playground, or museum to pass a little time. If you have younger children, it might also be wise to find a friend or family member who will be willing to host you if your plans go south. You don’t want to show up to an open house at all, much less with a screaming toddler. Keeping a house market-ready with kids can be a challenge, but don’t be intimidated. Prep your home and family, and make arrangements for showings and open house events. With a plan under your belt, there’s nothing stopping you from keeping your house buyer-ready until that magic day it’s sold! Photo Credit: Pexels

bottom of page