81 results found for ""

  • Kafka Producer and Consumer example (in Java)

    In this Kafka pub sub example you will learn, Kafka producer components (producer api, serializer and partition strategy) Kafka producer architecture Kafka producer send method (fire and forget, sync and async types) Kafka producer config (connection properties) example Kafka producer example Kafka consumer example Prerequisite - refer my previous post on Apache Kafka Overview (Windows). Apache Kafka is one of the client for Kafka broker. It publishes message to kafka topic. Messages are serialized before transferring it over the network. Kafka Producer Components Producers API Kafka provides a collection of producer APIs to publish the message to the topic. The messaging can be optimize by setting different parameters. Developer can decide to publish the message on a particular partition by providing custom key for the message. Serializer Serializer serializes the message to pass over the network. Default or custom serializer can be set by developer to serialize the message. Below are the String Serializer. value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer The serialize are set for the value for the key and value both. Partitioner This component apply the hashing algorithm and finds the partition for the message, if keys are provided. If Key for the message is not provided by developer then it uses the round-robin algorithm to assign the the topic for the message. Kafka Producer Send Methods Fire and Forget Producer does not care for the message arrives at destination or not. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data); Synchronous Send Send() method returns future object and developer can use the get() method on future object to know the status of message. ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data).get(); Asynchronous Send Developers can use the send() with a callback function which is called once broker send the response back to the producer. TestCallback callback = new TestCallback(); ProducerRecord data = new ProducerRecord ("topic", key, message ); producer.send(data, callback); private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } Producer Configuration bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Kafka Producer Example Step-1: Start Zookeeper Step-2: Start Kafka broker and create a topic TEST.TOPIC Step-3: Create a Java project Step-4: Create a properties file - kconnection.properties bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.serializer=org.apache.kafka.common.serialization.StringSerializer key.serializer=org.apache.kafka.common.serialization.StringSerializer retries=2 TOPIC_NAME=TEST.TOPIC batch.size=32768 linger.ms=5 buffer.memory=33554432 max.block.ms=60000 Step-5: KafkaConnection.java package com.demo.twitter.util; import java.io.InputStream; import java.util.HashMap; import java.util.Map; import java.util.Properties; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; public class KafkaConnection { static Properties props=null; private static Properties loadConPropsFromClasspath() throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream("kconnection.properties"); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public static Producer getKafkaConnection()throws Exception{ Properties props=loadConPropsFromClasspath(); Producer producer = new KafkaProducer(props); return producer; } public static String getTopicName() throws Exception{ if(props!=null){ return props.getProperty(IKafkaSourceConstant.TOPIC_NAME); }else{ return null; } } } Step-6: KafkaProducerClient.java package com.demo.client.producer; import org.apache.kafka.clients.producer.Callback; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import com.demo.twitter.util.KafkaConnection; public class KafkaProducerClient { static KafkaProducer producer=null; /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { try{ KafkaProducerClient pclient=new KafkaProducerClient(); long i = 1; for (; i <=10 ; i++) { KafkaProducerClient.sendMessage(""+i, "Hello This is test message ..Demo"+i); } System.out.println("Number of message sent "+(i-1)); pclient.closeProducer(); }catch(Exception e){ e.printStackTrace(); } } public static void sendMessage(String key,String message)throws Exception{ try{ if(producer==null){ producer =(KafkaProducer) KafkaConnection.getKafkaConnection(); System.out.println("Kafka Connection created for topic.. demo"+KafkaConnection.getTopicName()); } TestCallback callback = new TestCallback(); long startTime=System.currentTimeMillis(); ProducerRecord data = new ProducerRecord(KafkaConnection.getTopicName(), key, message ); producer.send(data, callback); System.out.println("Total Time:---> "+Long.valueOf(System.currentTimeMillis()-startTime)); }catch(Exception e){ e.printStackTrace(); producer.close(); } } public void closeProducer(){ try{ producer.close(); }catch(Exception e){ e.printStackTrace(); } } private static class TestCallback implements Callback { public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { System.out.println("Error while producing message to topic :" + recordMetadata); e.printStackTrace(); } else { String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset()); System.out.println(message); } } } } Apache Kafka Consumer Example Continue in the same project. Step-1: Create a properties file: kconsumer.properties with below contents bootstrap.servers=localhost:9092 acks=all ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 TOPIC_NAME=TEST.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 Step-2: Create KafkaConsumerClient.java package com.demo.kafka.consumer; import java.io.InputStream; import java.util.Collections; import java.util.Properties; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import com.demo.twitter.util.KafkaConnection; public class KafkaConsumerClient { Properties props=null; KafkaConsumer consumer =null; public static void main(String[] args) { KafkaConsumerClient conClient=new KafkaConsumerClient(); try { conClient.subscribeMessage("kconsumer.properties"); } catch (Exception e) { e.printStackTrace(); } } public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ //Common for below two approach if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } public KafkaConsumer getKafkaConnection(String fileName)throws Exception{ if(props==null){ props=loadConPropsFromClasspath(fileName); System.out.println(props); } KafkaConsumer consumer = new KafkaConsumer(props); return consumer; } private Properties loadConPropsFromClasspath(String fileName) throws Exception { if(props==null){ InputStream stream = KafkaConnection.class.getResourceAsStream(fileName); props = new Properties(); props.load(stream); stream.close(); System.out.println("Configuration "+props); } return props; } public String getTopicName() throws Exception{ if(props!=null){ return props.getProperty("TOPIC_NAME"); }else{ return null; } } } Thank you. If you have any question please write in comments section below. [09/09/2019 10:38 PM CST - Reviewed by: PriSin]

  • Apache Kafka Overview (Windows)

    Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache. Why Use Kafka? Multiple producers Multiple consumers Disk based persistence Highly scalable High performance Offline messaging Messaging replay Kafka Use Cases 1. Enterprise messaging system Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need. Suitable for both online and offline messaging consumer system. 2. Message Store with playback capability Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration. Each message is backed up with distributed file system. Supports the storage size for 50K to 50 TB. 3. Stream processing Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window. Download and Install Kafka Kafka requires below JRE and Zookeeper. Download and Install the below components. JRE : http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html ZooKeeper : http://zookeeper.apache.org/releases.html Kafka : http://kafka.apache.org/downloads.html Installation (on Windows) 1. JDK Setup Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened. Edit the PATH variable and append “;%JAVA_HOME%\bin” To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed 2. Zookeeper Installation: Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf) Rename file "zoo_sample.cfg" to "zoo.cfg". Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data. Add entry in System Environment Variables as we did for Java. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10 Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin; You can change the default Zookeeper port in zoo.cfg file (Default port 2181). Run Zookeeper by opening a new cmd and type zkserver. 3. Kafka Setup: Go to your Kafka config directory. For me its C:\kafka_2.10-\config. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-\kafka-logs". If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181. Testing Kafka Creating Topics Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup). If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system. Open a new command prompt in the location C:\kafka_2.11-\bin\windows and type following command and hit enter. kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic Creating a Producer · Open a new command prompt in the location C:\kafka_2.11-\bin\windows. · To start a producer type the following command: kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic Start Consumer · Again open a new command prompt in the same location as C:\kafka_2.11-\bin\windows · Now start a consumer by typing the following command: kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic Now you will have two command window Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt Some Other Useful Kafka Commands List Topics: kafka-topics.bat --list --zookeeper localhost:2181 Describe Topic kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name] Read messages from beginning: kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning Delete Topic kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181 Kafka Architecture Kafka system has below main component, which are co-ordinated by Zookeeper. Topic Broker Producers Consumers 1. Topic Can be considered like a folder in a file system Producers published the message to a topic Message is appended to the topic. Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number. For each topic, the Kafka cluster maintains a partitioned log. Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance. Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". Kafka provides ordering of message per partition but not across the partition. 2. Broker Core component of Kafka messaging system. Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper. Kafka cluster consists of one or more broker. Maintains the replication of partition across the cluster. 3. Producers Publishes the message to a topic(s). Messages are appended to one of the topic. It is one of the user of the Kafka cluster Kafka maintains the ordering of the message per partition but not the across the partition. 4. Consumers Subscriber of the messages from a topic One or more consumer can subscriber a topic from different partition, called consumer group. Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition. Each consumer maintains the offset for subscribing partition. A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic 5. Message Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key. A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number) Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network. Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput. Visit this link for Apache Kafka Producer with Example using java If you have any question please mention in comments section below. Thank you. #KafkaOverview #ApacheKafkaWindows #KafkaZookeeperInstallation #KafkaUseCases #KafkaCommands [09/07/2019 5:49 PM CST - Reviewed by: PriSin]

  • $74 Billion in Lost Bitcoin - Modern Day Buried Treasure

    In 2018, the digital forensics firm Chainalysis has estimated that around 35% of all bitcoin is likely lost. This is approximately 7.4 million of the 21 million bitcoin that will ever exist. Which this month (August 2019) has been averaging at a market value of $10K per bitcoin. Extrapolating, this is over $74 billion worth of bitcoin. Losing Bitcoin To be more exact, the bitcoin aren't lost but frozen. Bitcoin are inaccessible to their owners without a private key, a 256-bit number, usually saved in a wallet file. It is these keys, stored on hard drives or flash drives, that people misplace/lose/throw away/erase. One of the most infamous stories involves James Howell, a British man who owned upwards of 7,500 bitcoins. Howell accidentally threw out the hard drive containing his wallet file, thus losing all access to his Bitcoin. At the time he lost it, the digital currency were valued at $7.5 million. The fortune is currently buried at his local landfill in Newport, South Wales. And, despite his desire to search, his local city council officials won't allow him due to safety concerns. These 7,500 bitcoins are now estimated to be worth $75 million. Similarly, Campbell Simpson threw away a hard drive containing 1,400 bitcoin, currently valued to be worth $14 million. Even Elon Musk seems to have lost his bitcoin. To be clear, we aren't discussing how people lose bitcoin through careless/frivolous spending or outright theft. Those bitcoin have just been transacted and are under different ownership but they are still "active" coins. The bitcoin we're discussing has become forsaken, abandoned and inaccessible, buried forever in an online vault. Forgotten pins, key holders who have passed away, overwritten USB sticks; there are countless ways people have lost their bitcoin but what they have in common is lost passwords/keys. Bitcoin Protection Your bitcoin is protected with a 256-bit key which has 2^256 possible combinations or 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible combinations. It's been estimated that even the fastest supercomputer in the world (Tianhe-2) would take millions of years to crack 256-bit encryption. Bitcoin Recovery The US Treasury Department replaces and redeems an estimated $30 million in mutilated, damaged, and burned currency every year. However, there is no one that can really help you if you lose your bitcoin password since there's no way to reset the password. Besides dumpster diving, some desperate investors have even resorted to hypnosis in an attempt to conjure up their keys. However, one company offers a glimmer of hope to those who lost their keys. Wallet Recovery Services, managed by Dave Bitcoin (an alias), claims a 30% success rate in hacking passwords by brute force decryption and charges his clients 20% of the amount in the wallet. Good luck to all you investors and keep your wallets safe~

  • Frequently asked Informatica interview questions and answers 2019

    What is Informatica and Enterprise Data Warehouse? Informatica is an ETL tool which is used to extract, transform and load data. It plays crucial role in building Enterprise Data Warehouse. EDW provides user a global view on historical data stored from various department in an organization like finance, sales, labor, marketing etc based on which critical business decisions could be made. What is repository service, integration service, reporting service, domain and node? There are basically three types of services which you will find under application services: Repository service - It is responsible for storing Informatica metadata (source definition, target definition, mappings, mapplets and transformations in the form of tables) and providing access to these repositories for other services. It is also responsible for maintaining connection between client development tool and Informatica repository which keeps metadata up-to-date by updating, inserting and deleting the metadata. Integration Service - It is responsible for execution of workflow and end-to-end data movement from source to target. It pulls the workflow information from repository, starts the execution of tasks, gathers/combines data from different sources (for example relational database and flat file) and loads into single or multiple targets. Reporting Service - It enables report generation. Domain - It is administrative part of Informatica which is used by admins to operate nodes and services. You can define multiple nodes in a domain with a gateway node which is responsible for receiving request and distributing it to worker nodes. Further, nodes are responsible for running services and other Informatica processes. There are basically two types of services in a domain - Service manager and Application services. Service manager is responsible for logging and login operations. Like authentication, authorization, managing people and groups. Application services are responsible for managing integration services, repository services and reporting services. Key properties of domain are as follows: Database properties - You can define database instance name and port responsible for holding domain. It consist of database type (like Oracle, SQL server etc), host, port, database name and user name. General Properties - You can define resilience timeout, restart period, restart attempts, dispatch mode etc. For example if services goes down how many seconds application services wait to again connect to respective service depends upon resilience timeout, how long Informatica can try restarting those services depends upon restart period and attempts. How tasks will be distributed to worker nodes from gateway will depend upon dispatch mode like round robin. What is PowerCenter repository? It is like a relational database which stores Informatica metadata in the form of tables (underlying database could be Oracle database or SQL server or similar) and it is managed by repository services. What is Informatica client tool? Informatica client tool is basically developer tool installed on client machine and it consist of four parts: Designer (Source, target, mapping, mapplet and transformations designer) Workflow manager (Task, worklet and workflow designer) Monitor Repository manager Basic terminology consist of: Workflow, Worklet, Sessions & Tasks: Workflow consists of one or more session, worklet and task (includes timer, decision, command, event wait, mail, link, assignment, control etc) connected in parallel or sequence. You can run these sessions by using session manager or pmcmd command. Further, you can write pre-post-session commands. Mappings, Mapplets & Transformations: Mappings are collection of source, target and transformations. Mapplets (designed in mapplet designer) are like re-usable mappings which contains various transformations but no source/target. You can run the mapping in debug mode without creating a session. There are mapping parameters and variables as well. Mapping parameters represent constant values that are defined before running a session while mapping variables can change values during sessions. What could be the various states of object in Informatica? Valid - fully syntactically correct object. Invalid - where syntax and properties are invalid according to Informatica standards. Informatica marks those objects invalid. It could be mapping, mapplet, task, session, workflow or any transformation. Impacted - where underlying object is invalid. For instance in a mapping suppose underlying transformation has become invalid due to some change. What happens when user executes the workflow? User executes workflow Informatica invokes integration service to pull workflow details from repository Integration service starts execution of workflow after gathering workflow metadata Integration service runs all child tasks Reads and combine data from sources and loads into target After execution, it updates the status of task as succeeded, failed, unknown or aborted Workflow and session log is generated What is the difference between ABORT and STOP? Abort will kill process after 60 seconds even if data processing hasn't finished. Session will be forcefully terminated and DTM (Data Transformation Manager) process will be killed. Stop will end the process once DTM processing has finished processing. Although it stops reading data from source as soon as stop command is received. What are the various types of transformation available in Informatica? There are basically two categories of transformation in Informatica. Active - It can change the number of rows that pass through transformation for example filter, sorter transformations. It can also change the row type for example update strategy transformation which can mark row for update, insert, reject or delete. Further it can also change the transaction boundary for example transaction control transformation which can allow commit and rollback for each row based on certain expression evaluation. Passive - Maintains same number of rows, no change in row type and transaction boundary. Number of output rows will remain always same as number of input rows. Active transformation includes: Source Qualifier: Connected to relational source or flat file, converts source data type to Informatica native data types. Performs joins, filter, sort, distinct and you can write custom SQL. You can have multiple source qualifier in single session with multiple targets, in this case you can decide target load order as well. Joiner: It can join two heterogeneous sources unlike source qualifier which needs common source. It performs normal join, master outer join, detail outer join and full outer join. Filter: It has single condition - drops records based on filter criteria like SQL where clause, single input and single output. Router: It has input, output and default group - acts like filter but you can apply multiple conditions for each group like SQL case statement, single input multiple output, more easier and efficient to work as compared to filter. Aggregator: It performs calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups. It also provides extra cache files to store transformation values if required. Rank Sorter: Sorts the data. It has distinct option which can filter duplicate rows. Lookup: It has input, output, lookup and return port. Explained in next question. Union: It works like union all SQL. Stored Procedure Update Strategy: Treats source row as "data driven" - DD_UPDATE, DD_DELETE, DD_INSERT and DD_REJECT. Normalizer: Takes multiple columns and returns few based on normalization. Passive transformation includes: Expression: It is used to calculate in single row before writing on the target, basically non-aggregate calculations. Sequence Generator: For generating primary keys, surrogate keys (NEXTVAL & CURRVAL). Structure Parser Explain Lookup transformation in detail? Lookup transformation is used to lookup a flat file, relational database table or view. It has basically four ports - input (I), output (O), lookup (L) and return port (R). Lookup transformations can be connected or unconnected and could act as active/passive transformation. Connected lookup: It can return multiple output values. It can have both dynamic and static cache. It can return more than one column value in output port. It caches all lookup columns. Unconnected lookup: It can take multiple input parameters like column1, column2, column3 .. for lookup but output will be just one value. It has static cache. It can return only one column value in output port. It caches lookup condition and lookup output port in return port. Lookup cache can be made cached or no cache. Cached lookup can be static or dynamic. Dynamic cache can basically change during the execution of process, for example if lookup data itself changes during transaction (NewLookupRow). Further, these cache can be made persistent or non-persistent i.e. it tells Informatica to keep lookup cache data or delete it after completion of session. Here is types of cache: Static - static in nature. Dynamic - dynamic in nature. Persistent - keep or delete the lookup cache. Re-cache - makes sure cache is refreshed if underlying table data changes. Shared cache - can be used by other mappings. What are the various types of files created during Informatica session run? Session log file - depending upon tracing level (none, normal, terse, verbose, verbose data) it records SQL commands, reader-writer thread, errors, load summary etc. Note: You can change the number of session log files saved (default is zero) for historic runs. Workflow log file - includes load statistics like number of rows for source/target, table names etc. Informatica server log - created at home directory on Unix box with all status and error messages. Cache file - index or data cache files; for example aggregate cache, lookup cache etc. Output file - based on session if it's creating output data file. Bad/Reject file - contains rejected rows which were not written to target. Tracing levels: None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping. Terse: logs initialization information, error messages, and notification of rejected data in the session log file. Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details; names of index and data files used, and detailed transformation statistics. Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation. What is SQL override in Informatica? SQL override can be implemented at Source Qualifier, Lookups and Target. It allows you to override the existing SQL in mentioned transformations to Limit the incoming rows Escape un-necessary sorting of data (order by multiple columns) to improve performance Use parameters and variables Add WHERE clause What is parallel processing in Informatica? You can implement parallel processing in Informatica by Partitioning sessions. Partitioning allows you to split large amount of data into smaller sets which can be processed in parallel. It uses hardware to its maximum efficiency. These are the types of Partitioning in Informatica: Database Partitioning: Integration service checks if source table has partitions for parallel read. Round-robin Partitioning: Integration service evenly distributes the rows into partitions for parallel processing. Performs well when you don't want to group data. Hash Auto-keys Partitioning: Distributes data based on hash function among partitions. It uses all sorted ports to create partition key. Perform well before sorter, rank and unsorted aggregator. Hash User-keys Partitioning: Distributes data based on hash function among partitions, but here user can choose the port which will act as partition key. Key Range Partitioning: User can choose multiple ports which will act as compound partition key. Pass-through Partitioning: Rows are passed to next partition without redistribution like what we saw in round robin. What can be done to improve performance in Informatica? See the bottlenecks by running different tracing levels (verbose data). Tune the transformations like lookups (caching), source qualifiers, aggregator (use sorted input), filtering un-necessary data, degree of parallelism in workflow etc. Dropping the index before data load and re-indexing (for example using command task at session level) after data load is complete. Change session properties like commit interval, buffer block size, session level Partitioning (explained above), reduce logging details, DTM buffer cache size and auto-memory attributes. OS level tuning like disk I/O & CPU consumption.

  • Data Driven Parenting: An Introduction (Entry 1)

    As a first-time parent, I find myself wondering how each decision I'm making might end up "messing up" my kid. How each factor that I introduce could ripple down and somehow eventually lead to my child sitting on a meticulously upholstered psychiatrist's couch talking about how all their problems stemmed from childhood and were particularly the fault of some defect or distortion in their relationship with their mother (aka me). But I think back to my own childhood: left for long unsupervised lengths times in the car parked outside of a grocery store, freely flipping through mystery/horror/slasher movies with my friends, and eating Hot Pockets and Pop-Tarts for dinner. Am I messed up? I mean, probably a little but aren't we all? There are libraries filled with parenting advice, oftentimes offering contrary opinions. Homo Sapiens have perpetuated for an estimated 200,000 to 300,000 years, how badly could we be doing? These are the concerns that I imagine drove Brown University economist Emily Oster to write Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Penguin Press). I paged through it quickly at my local bookstore and it got me thinking: how much does parenting actually contribute to child outcome? What should we really be doing? Can we actually mess up our kids? Utilizing Oster's compiled research from Cribsheet as a foundation, I'll be exploring what the past and present research states, as well as what findings in animals has also suggested. What does the data on parenting say? Are we just becoming more anxious, more allergic, more obese, hopeless? doomed?? Does anyone really know what they're doing? Hopefully, we'll find out. Join me later for Data Driven Parenting: ??? Entry 2.

  • Ultimate Minimalist Baby Registry: The Bare Minimum Baby

    Whether you’re a expecting a child of your own or you’re gifting to expectant friends/family, here is a list of absolute must-haves for new parents: Minimalist Must-Haves 1. Sleeping receptacle Whether parents plan on co-sleeping with the popular DockATot or opting for the luxury bassinet Snoo, babies need a place to sleep. If you live in an apartment or are just keeping it simple, opt for a Pack 'n Play (Pack & Play mattress sold separately). Unless your baby is in the 99th percentile in length, this can easily serve as their crib until for up to 1 year. Ikea's crib is also a tasteful, safe, and affordable addition to any nursery (mattress sold separately). Depending on which route you choose, you'll need some bedding. I recommend 4 sets of sheets. If you do your laundry once a week, that should give you enough slack for nighttime accidents/leaks. You can buy waterproof sheets but keep in mind that a lot of mattresses are also waterproof. 2. Travel system If you're driving home from the hospital, you won't be released unless you have a car seat. Do not buy your car seat at the thrift store or even "lightly used". This is not the baby item you should be trying to save money on. These also have a lifespan so don't use an old dusty one your aunt fishes out of her storage unit. There are a multitude of companies who manufacture safe car seats. We personally went with a Chicco Travel system, the easy pop-out feature made travelling a breeze. If you opt out of a car seat (maybe you don't own a car and only use cramped public transit systems) then you can also "wear" your baby out in a carrier. Popular options include a sling, wrap, or front pack. There are a ton of different options for baby carriers depending on your own preference so shop around and do your research. 3. Diaper bag You can use pretty much any kind of bag as a diaper bag so just put some thought into what you find most convenient (tote, backpack, messenger, etc). Of course, bags intended as diaper bags have very convenient pockets for organizing so if you plan on using a daycare or just taking your kiddo place to place, a diaper bag is an excellent investment. 4. Baby wipes Here's where it gets complicated. Many babies have sensitive skin and you won't really know if your baby will react to a certain brand of wipe. So I recommend buying a couple small packs of wipes before you go buying a Costco value box. However, if you're the type who likes to have apocalyptic preparations then consider stocking up on sensitive skin wipes such as the popular Water Wipes. 5. Diapers* A note on diapers - unlike wipes, these have sizes which your child will rapidly outgrow. Your baby shower may likely be flooded with value boxes of diapers. Some parents stock up the Newborn size diapers only to find that their baby outgrew them in just a few days. I recommend buying/registering for a small pack of the newborn and a single larger pack of the size 1 fit. I personally like the Honest diapers (cute prints) but if you're on a budget, generic brands at Target or even Aldi's will work just as nice. 6. Clothing A note on clothing - this is probably one of the most popular baby shower gifts and you'll receive a lot of different outfits. Some baby clothing can be soooo adorable but also be an absolute nightmare to get on and off. I recommend have 7 easy, everyday outfits (with mittens for sharp clawed newborns) that you use in rotation. The zippered sleep n play style is particularly easy. Also, take your local climate in consideration before you stock up. Is your 6 month size fleece snowsuit going to be useful when your child is 6 months in July? We were gifted so many cute outfits that our baby just didn't have time to wear. 7. Swaddling/burp cloths/breast-feeding covers As a catch-all for this category, muslin blankets are just absolutely perfect. These are so large and useful, I recommend have 6 as you'll want one in the diaper bag, the bedroom, and the play area at all times. They also make excellent little play mats when you're outside or at a friend's house (aka aren't sure about putting your baby down, carpet looks questionable). The popular Aden + Anais brand has many cute design options and are very well made and durable. 8. Night light You are going to be getting up at all times of the night and if you don't want to have the abrupt shock of turning on your overhead lighting, you'll definitely need a night light. There are so many different designs and brands so you really have so much freedom to choose what you like. 9. Bottles and pacifiers* A note on bottles and pacifiers - if you plan on breast feeding, it may be difficult to convince your child to switch from one or the other. Consult your local hospital's breast feeding consultant more on this matter. However, if you're on maternity leave, you'll eventually have to return to work so you'll need a nice set of bottles. If you hate the idea of washing bottles, you can also opt for disposable pumping bags like Kiinde ones that your baby can directly drink from. When your baby starts solids, you can also pack these pouches with baby food for on-the-go snacks. We use the Comotomo bottles (4), they tumble around a little in the fridge but we love how easy they are to clean. We also use the BIBS pacifiers, they are truly a lifesaver and other parents at our daycare have asked about them and switched. 10. Boppy The Boppy newborn lounger is just a super easy place to set your baby down, comfy and convenient. The original Boppy is great for breast-feeding and for propping up baby while they learn to sit-up. If you want just one, I have to say that I'd go with the newborn lounger. It was so great and she loved napping in it so much. And its also pretty comfy to use as a pillow for adults if you want to steal it for a bit. 11. Changing pad There's a good amount of accidents that can occur while you're changing your baby and a waterproof changing pad is a good investment. Also, our little one loves to sleep on her Ikea changing pad for some reason. She absolutely hated being in the Snuggle Me Organic but she'd relax and doze off immediately on her changing pad. Kids are strange. 12. Baby bathtub There's nothing more anxiety inducing than giving your baby a bath for the first time so do yourself a favor and get a simple baby bathtub. 13. Towels and washcloths Getting 3-4 towels and 8-10 washcloths is more than enough since you won't be bathing baby everyday. The towels and washcloths can also play double duty as burp clothes or to clean up messes, especially when they start on solid foods. 14. Toiletries A baby wash, sunscreen, and baby ointment are the bare necessities. Due to allergies, you may not want to stock up on these before you figure out if your little one will have a reaction. Our hospital recommended Johnson's baby shampoo so we've used that and we swear by Aquaphor Healing Ointment for diaper rash and just skin irritation. 15. Baby first aid and grooming kit You can register for a kit or compile your own. The first thing I would add is the number for Poison Control (800-222-1222). The second thing you'll need is a baby thermometer so you can check for fevers. You'll also need nail clippers/grinders, a hairbrush, and snot removal device (like the snot-sucker). Consult your doctor for the use of baby pain management and fever reducers (acetaminophen etc). Indulgent Nice-To-Haves 16. Changing table/station 17. Bathtub kneeler 18. Baby swing/bouncer/rocking chair 19. Baby white noise machine 20. Diaper bin 21. Baby monitor 22. Books 23. Toys There are so many things that will depend on your baby's temperament, likes, and dislikes so if at all possible - REGISTER FOR GIFT CARDS. Of course this request depends on your baby shower guest's temperament, likes, and dislikes. But this way, you can react more flexibly instead of having crippling buyer's remorse for that one expensive baby swing that your child cries at the sight of. A cluttered house filled with baby gear can be stressful and anxiety-inducing, especially if the baby doesn't like to use any of it. Good luck and best wishes! #minimalist #parenting #baby #babyregistry #babyregistrylist #babymusthaves #newborn #firsttimeparent #minimalism #minimalistparenting #nursery #apartment #decluttering

  • Facts you should know about Food Waste & "Too Good To Go" Application

    Food waste is a major problem now-a-days in developed countries like America and Europe. Majority of the food waste comes from restaurants and other retail distributors where quality of product is first priority. Some alarming facts about Food wastage which you should know: Roughly one-third of the food produced in the world is wasted every year (approximately 1.3 billion tons). Every year, consumers in rich countries waste almost 222 million tons of food whereas the entire net food production of Sub-Saharan Africa is 230 million tons approximately. Per capita waste by consumers is between 95 to 115 kg per year in Europe and North America, whereas it is only 6-11 kg per year in Sub-Saharan Africa, south and Southeastern Asia. In all over Europe, there is a trend in food sector to throw the remaining fresh food items to uphold the quality of food and hygiene standards. Such practice undoubtedly brings high-value customer satisfaction, trust and is highly commendable but it also causes wastage of food in larger quantity. Certainly developed countries like Europe and USA can afford such "arrogance" but in today’s era of sustainable development goals (SDG) and carbon neutrality, is it not a white-collar crime to keep following such practice in the name of quality maintenance and customer satisfaction? We live in a world where more than 1 billion people suffer from hunger and in every 5 seconds, a child dies because of hunger or of directly related causes. Certainly, the condition is different in developed nations where the priority is more on Quality of food other than its utilization but this ignorance of reality cannot support such activity. Various steps have been taken to reduce the wastage of food. One of such step is the "Too Good To Go" mobile app which is in association with various restaurants of Europe preventing the food wastage by selling it to desired consumer at highly discount rate. It allows both, the consumers and restaurants to prevent the food items from going to bin and using it judiciously. Thus, it saves a large quantity of food from going to the bin and in turn generating a profit out of it. I have been traveling within Europe for past two years and was always perturbed from this act of throwing good quality fresh food to waste bin instead of selling it in lower price to ensure its full utilization. I was not aware of such mobile application, which significantly reduces food wastage and in turn makes the world a better place to live. Nevertheless, such apps should be promoted to ensure not a single grain of food should ever go wasted again.

  • Kafka Producer Example

    In this Apache Kafka tutorial you will learn - How to Install Apache Kafka on Mac using homebrew. To install Kafka on linux machine refer this. Kafka Zookeeper Installation $ brew install kafka Command will automatically install Zookeeper as dependency. Kafka installation will take a minute or so. If you are working in cluster mode, then you need to install it on all the nodes. How to start Kafka & Zookeeper? You don't need to run these commands right now but just for understanding you can see how to start them in output log. To start zookeeper now and restart at login, you need to run this: $ brew services start zookeeper Or, if you don't want/need a background service you can just run: $ zkServer start To start kafka now and restart at login: $ brew services start kafka Or, if you don't want/need a background service you can just run: $ zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties & kafka-server-start /usr/local/etc/kafka/server.properties Zookeeper & Kafka Server Configuration You can open Zookeeper properties file to see default configuration, there is not much to explain here. You can see the port number where client (kafka in this case) will connect, directory where snapshot will be stored and the max number of connections per-ip address. $ vi /usr/local/etc/kafka/zookeeper.properties # the directory where the snapshot is stored. dataDir=/usr/local/var/lib/zookeeper # the port at which the clients will connect clientPort=2181 # disable the per-ip limit on the number of connections since this is a non-production config maxClientCnxns=0 Similarly, you can see default Kafka server properties. You just need to change listener settings here to localhost (standalone mode) or change it to ip-address of node in cluster mode. $ vi /usr/local/etc/kafka/server.properties Server basics - Basically you define broker id here, its unique integer value for each broker. Socket server settings - Here, you define the listener hostname and port, by default it's commented out. For this example hostname will be localhost, but in case of cluster you need to mention respective ip-addresses. Setup like this, listeners=PLAINTEXT://localhost:9092 Log basics - Here you define log directory, number of log partitions per topic and recovery thread per data directory. Internal topic settings - Here you can change topic replication factor which is by default 1, usually in production environment its > 1. Log flush policy - By default everything is commented out. Log retention policy - Default retention hour is 168. Zookeeper - Default port number is same which you saw during installation : 2181 Group coordinator settings - This is the rebalance time in milliseconds when new member joins as consumer. Kafka topics are usually multi-subscriber, i.e. there will be multiple consumers to one topic. However, it can have 0,1 or more consumers. Starting Zookeeper & Kafka To start Zookeeper and Kafka, you can start them together like below or run each command separately i.e. start Zookeeper first and then start Kafka. $ zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties & kafka-server-start /usr/local/etc/kafka/server.properties This will print a long list of INFO, WARN and ERROR messages. You can scroll back up and look for WARN and ERRORS if any. You can see producer id, broker id in the log, similarly other properties which is setup by default in kafka properties file which I explained earlier. Let this process run, don't kill it. Create a Topic and Start Kafka Producer To create a topic and to start producer, you need to run this command; $ kafka-console-producer --broker-list localhost:9092 --topic topic1 Here my topic name is "topic1" and this terminal will act as producer. You can send messages from this terminal. Start Kafka Consumer Now, start the Kafka consumer, you need to run this command; $ kafka-console-consumer --bootstrap-server localhost:9092 --topic topic1 --from-beginning Bootstrap-server is basically the server to connect to. For this example its localhost with 9092 default port. Screen on the left is Producer and screen on the right is Consumer. You can see how messages are transferred from one terminal to another. Thank you. If you have any question please mention in comments section below. #Kafka

  • How to write your first blog at Data Nebulae?

    Guest blogging at Data Nebulae is very simple. Please read these instructions carefully before you start. Step 1: Sign Up Once you sign up, you will automatically receive writers privilege within 24 hours. If you have additional question please email us. Step 2: Start Writing That's all! You are ready to create posts. But before you start please read blogging guidelines carefully. Blogging Guidelines These rules are meant to keep quality blogging at Data Nebulae. Blog Uniqueness Blogs should be unique. Dataneb don't accept syndicated/unoriginal posts, research papers, duplicate posts, copying others content/articles is strictly prohibited. NOTE: Violation to this guidelines will result into direct loss of writers privilege. Blog Length Blogs should have minimum 3000 characters, there is no upper limit. You will find total number of characters on the top left corner of editor while drafting blogs. Blogs not fulfilling this criteria will be automatically moved to draft status. Image Requirement You can insert images (but it should not be a copyright image). Or, you can leave it to us. One of our moderators will handle image requirement. Back-links Back-links are allowed (maximum 5 & sometimes more) as far as intention is clear. Make sure you are not linking any blacklisted websites. Miscellaneous Moderators has authority to add keywords, modify texts, images etc so that your blog can get higher Google ranking. This will help your blog to get more views. You can delete your post anytime, but Data Nebulae has full rights to re-publish that content again. Example Editor Before you start please refer this post for your reference. See how paragraph is written, header size, bullet points, image alignment, divider line, hashtags etc are used. This is how your blog editor would look like: Wait! There is a Easier Way to Publish Your Blog We understand you are a beginner and you don't want to publish your blog without review. Don't worry! Just draft the blog and save it. Email us when your blog is ready to publish and one of our moderators will review/publish it for you. If you are just a member and don't want to become a writer. You can also write your post in a word document and email us for submission. What's next? Share your blog post on Facebook, Twitter etc to get more views, earn badges and invite others. Sharing blogs on social media is the easiest and fastest way to earn views. We value your words, please don't hurt others feelings while commenting on blog posts and maintain quality environment at Data Nebulae. Email us if you have any query. Good Luck!

  • SSB Interview Procedure 2019

    SSB interview procedure is a mandatory process to become a Service Selection Board officer in any of Indian defence force irrespective of the age group and kind of entry it is. There just can’t be any Indian defence officer who had not experienced this five days Service Selection Board (SSB) Interview process. Because of its high rejection rate, it has gained too much importance for aspiring candidate and so as the expert guidance on it in terms of coaching and books published over it. Still it’s very difficult to get a correct methodology to prepare for it and clear it successfully. I believe this complete phenomena of SSB Interview has been become more commercialized and made it a kind of hype and correct and easier procedure had been lost in process. SSB interview is very scientifically designed evaluation system that ensure correct intake of officers into the system for its overall growth. The board assesses the suitability of the candidate for becoming an officer, It constitutes of personality, intelligence tests, and interviews. The tests are of both types i.e. written and practical task-based. In total there are thirteen Service Selection Boards across India, out of which four boards are for Indian Army, four boards are for Indian Air Force and five boards for Indian Navy. The Service Selection Boards of Indian army are located at: SSB North (Kapurthala, PB) SSB South (Bangalore, KN) SSB Central (Bhopal, MP) SSB East (Allahabad, UP) SSB interview consist of three separate evaluation system spread over five days. Day wise procedure is as given below. Day of reporting The selected candidates are provided exact date and time of reaching at the respective SSB center in their call letter. Reception center is established in nearest railway station which further arrange the necessary pick and drop from station to center. Upon arrival, a distinguishing chest no is provided to each candidate which in turn becomes their identity for this exam process. Their educational documents are checked for initial verification and they are allotted the berth for stay. A briefing about the schedule, various tests and general instructions is given. Day 1: Screening Test On first day, Screening test is conducted which segregate the best from the crowd. Normally more than half of candidates doesn’t make beyond this point. Screening Test includes; Intelligence Test – Which consist of two Tests. Verbal and Non-Verbal. (About 50 questions each) Picture Perception & Picture Description Test (PPDT) - In this test, a picture is shown to the candidates for 30 seconds. Each candidate observes it and then, in the next one minute, must record the number of characters seen in the picture. Then, in four minutes, draft a story from the picture (and not just describe the picture). The candidate must record the mood, approximate age and gender of the "main character". Group discussion on the PPDT - In stage two of the PPDT, the candidates are given their stories, which they may revise. Then, in a group, each candidate must narrate his story in under one minute. The group is then asked to create a common story involving each or their perceived picture stories. Selected candidates are shifted to different accommodation where they are going to stay for next four days of interview process. Remaining candidates are sent back to their house. Day 2: Psychology Test Following tests are conducted during Second day of SSB interviews. Thematic Appreciation Test (TAT) - Candidates are shown a picture for thirty seconds and then write a story in the next four minutes. Twelve such pictures are shown sequentially. The last picture is a blank slide inviting the candidates to write a story of their choice. Word Association Test (WAT) - Candidates are shown sixty simple, everyday words for fifteen seconds each and they need to write a sentence on each word.Thematic Apperception Test (TAT) Situation Reaction Test (SRT) - A booklet of 60 situations is given in which responses are to be completed in 30 minutes. Self Description Test (SDT) - Candidate is asked 5 questions about the there's parent's, teacher's, friend's and his own perception about himself. Day 3-4: GTO Tasks & Interview Following tests are conducted during this day of SSB interview. Group Discussion test (GD) Military Planning Exercise (MPE) Progressive Group Task (PGT) Individual Lecturettes Group Obstacle Race Half Group Task Personal interview of candidates is taken by SSB Board president. Day 5: Conference All the officers (in proper uniform) attend the conference where each candidate has a conversation with a panel of assessors. The assessors look for confidence and expression when speaking; a positive attitude in adversity and in life; and honesty. Following this, the final results are announced. Successful candidates remain for an intensive medical examination taking three to five days at a military hospital. Thank you. If you have any question please don't hesitate to ask in SSB group discussion forum or simply comment below. #SSBTips #SSBInterview