81 results found for ""

  • SSB Interview Procedure 2019

    SSB interview procedure is a mandatory process to become a Service Selection Board officer in any of Indian defence force irrespective of the age group and kind of entry it is. There just can’t be any Indian defence officer who had not experienced this five days Service Selection Board (SSB) Interview process. Because of its high rejection rate, it has gained too much importance for aspiring candidate and so as the expert guidance on it in terms of coaching and books published over it. Still it’s very difficult to get a correct methodology to prepare for it and clear it successfully. I believe this complete phenomena of SSB Interview has been become more commercialized and made it a kind of hype and correct and easier procedure had been lost in process. SSB interview is very scientifically designed evaluation system that ensure correct intake of officers into the system for its overall growth. The board assesses the suitability of the candidate for becoming an officer, It constitutes of personality, intelligence tests, and interviews. The tests are of both types i.e. written and practical task-based. In total there are thirteen Service Selection Boards across India, out of which four boards are for Indian Army, four boards are for Indian Air Force and five boards for Indian Navy. The Service Selection Boards of Indian army are located at: SSB North (Kapurthala, PB) SSB South (Bangalore, KN) SSB Central (Bhopal, MP) SSB East (Allahabad, UP) SSB interview consist of three separate evaluation system spread over five days. Day wise procedure is as given below. Day of reporting The selected candidates are provided exact date and time of reaching at the respective SSB center in their call letter. Reception center is established in nearest railway station which further arrange the necessary pick and drop from station to center. Upon arrival, a distinguishing chest no is provided to each candidate which in turn becomes their identity for this exam process. Their educational documents are checked for initial verification and they are allotted the berth for stay. A briefing about the schedule, various tests and general instructions is given. Day 1: Screening Test On first day, Screening test is conducted which segregate the best from the crowd. Normally more than half of candidates doesn’t make beyond this point. Screening Test includes; Intelligence Test – Which consist of two Tests. Verbal and Non-Verbal. (About 50 questions each) Picture Perception & Picture Description Test (PPDT) - In this test, a picture is shown to the candidates for 30 seconds. Each candidate observes it and then, in the next one minute, must record the number of characters seen in the picture. Then, in four minutes, draft a story from the picture (and not just describe the picture). The candidate must record the mood, approximate age and gender of the "main character". Group discussion on the PPDT - In stage two of the PPDT, the candidates are given their stories, which they may revise. Then, in a group, each candidate must narrate his story in under one minute. The group is then asked to create a common story involving each or their perceived picture stories. Selected candidates are shifted to different accommodation where they are going to stay for next four days of interview process. Remaining candidates are sent back to their house. Day 2: Psychology Test Following tests are conducted during Second day of SSB interviews. Thematic Appreciation Test (TAT) - Candidates are shown a picture for thirty seconds and then write a story in the next four minutes. Twelve such pictures are shown sequentially. The last picture is a blank slide inviting the candidates to write a story of their choice. Word Association Test (WAT) - Candidates are shown sixty simple, everyday words for fifteen seconds each and they need to write a sentence on each word.Thematic Apperception Test (TAT) Situation Reaction Test (SRT) - A booklet of 60 situations is given in which responses are to be completed in 30 minutes. Self Description Test (SDT) - Candidate is asked 5 questions about the there's parent's, teacher's, friend's and his own perception about himself. Day 3-4: GTO Tasks & Interview Following tests are conducted during this day of SSB interview. Group Discussion test (GD) Military Planning Exercise (MPE) Progressive Group Task (PGT) Individual Lecturettes Group Obstacle Race Half Group Task Personal interview of candidates is taken by SSB Board president. Day 5: Conference All the officers (in proper uniform) attend the conference where each candidate has a conversation with a panel of assessors. The assessors look for confidence and expression when speaking; a positive attitude in adversity and in life; and honesty. Following this, the final results are announced. Successful candidates remain for an intensive medical examination taking three to five days at a military hospital. Thank you. If you have any question please don't hesitate to ask in SSB group discussion forum or simply comment below. #SSBTips #SSBInterview

  • How to clear Google cloud professional data engineer certification exam?

    In this blog you will learn - How to get Google cloud certification? How much it cost to get Google certified? Best Google certification courses available online right now. How to train yourself with Google cloud certification practice exams before actual examination. Before we begin I would like to mention one fact, you can crack this exam even if you don’t have "any work experience or prior knowledge" of GCP (Google Cloud Platform). I am writing this blog to showcase how you can clear Google cloud professional data engineer certification without any prior knowledge of GCP. I would start by dividing this whole preparation into 3 basic sections: Online video lectures (absolutely free if completed within a time frame) Glance through some Google documentation Finally, few practice tests Step 1. Online Video Lectures Coursera: First begin with Coursera course which is also suggested by google and it's really knowledgeable. You can use 7 day free trial of coursera to complete this specialization. But since this is very big course, you will have to devote good amount of time everyday for these 7 days. This course comes with Qwiklabs where you can do lab assessments without creating any GCP account. Also this course comes with quizzes so as to get good understanding of GCP components with hands-on experience as well. Udemy: Next comes Udemy, it's a combined course for both data engineers and architects. This course will help you to understand real world implementation of GCP components. You can skip machine learning part from this course if you want. These two courses are not very exam oriented but will give you good understanding of every GCP component with some basic hands on. Now jumping to exam oriented video lectures, Cloud Academy and Linux Academy comes to our rescue. Both of these sites comes with a 7 days free trial option. Cloud Academy will give you good knowledge of most of the topics covered in the exam. You can learn machine learning from this course. Try to understand well each and every point covered in this course. This course also comes with quizzes for main topics. Understand well the explanations given for the quizzes. However this Cloud Academy course doesn’t cover topics such as data preparation and this is where linux academy comes into the picture. Linux Academy course has covered all the topics of the exam in most exam oriented way. You will get good understanding of machine learning and other remaining topics. This course also has topic wise tests and a full 2 hour test (50 questions) to give you a feel of real test. However I would recommend you to give this test at the last stage of preparation and also attempt this test at least thrice and score 100%. For revision I would suggest you to go through Linux academy’s Data Dossier. This is the best part of complete course which you will require at the last moment. Step 2: Google Documentation There are few topics such as big-query, pub-sub, data-studio for which you will have to go through google docs. For data flow you need to go through apache beam documentation. Understand following points of each of the components very well: Access Control Best practices Limitations For ML, understand well the different use cases where pre-trained ML apis are used. This will help you understand whether to use pre-build apis or to make a custom model. Step 3: Practice Test For practice tests you can go through the following: Google DE practice test Whizlabs Test Linux academy practice test Make sure you give all the tests at least thrice and understand well, each question and their answers. For each of the question you should understand why a particular answer is correct and why the remaining ones are incorrect. At the end I would suggest that google has made this exam very logical where in you need to know the in and out of every topic very well to clear the exam. So understand everything well and don’t try to memorize or mug up everything. Best of luck!!

  • Calling 911 for Pepperoni Pizza Delivery, But Why?

    Phone Conversation of 911 Operator (reference Reddit user Crux1836); Officer : “911, where is your emergency?” Caller : “123 Main St.” Officer : “Ok, what’s going on there?” Caller : “I’d like to order a pizza for delivery.” Officer : “Ma’am, you’ve reached 911” Caller : “Yeah, I know. Can I have a large with half pepperoni, half mushroom and peppers?” Officer : “Ummm… I’m sorry, you know you’ve called 911 right?” Caller : “Yeah, do you know how long it will be?” Officer : “Ok, Ma’am, is everything ok over there? do you have an emergency?” Caller : “Yes, I do.” Officer : “… And you can’t talk about it because there’s someone in the room with you?” Caller : “Yes, that’s correct. Do you know how long it will be?” Officer : “I have an officer about a mile from your location. Are there any weapons in your house?” Caller : “Nope.” Officer : “Can you stay on the phone with me?” Caller : “Nope. See you soon, thanks” (Officer) As we dispatch the call, I check the history at the address, and see there are multiple previous domestic violence calls. The officer arrives and finds a couple, female was kind of banged up, and boyfriend was drunk. Officer arrests him after she explains that the boyfriend had been beating her for a while. I thought she was pretty clever to use that trick. Definitely one of the most memorable calls. Another case which happed in UK ; The call went something like this: Operator : Police Emergency Caller : Hello, I’d like to order a curry please. Operator : You’re through to the police Caller : Could you deliver it to ‘123 Street’ Operator : Madam, this is the police, not a delivery service Caller : Could you deliver it as soon as possible? Operator : (starting to realize something is fishy) “Madam, are you in a situation where you cannot talk freely? Caller : Yes. Operator : Are you in danger? Caller : Yes. Operator : Okay, I’m arranging help for you immediately. Caller : Could you make it two Naan Breads? My husband is really hungry. Operator : I’ll send two officers. This transcript is purely based on memory from a police officer’s memoir. On the police response, a very angry man was arrested for domestic violence. There was obviously the risk that the operator could have hung up on a ‘time-wasting’ caller, but once they realized something was wrong, they changed scripts immediately. Can you actually call Emergency Services and “order a pizza” as a tactic for help? The answer is "No", there is no such 911 pizza call "code". Police and 911 operators say there’s no such secret code, and that your best option if you’re afraid of someone in the room overhearing your call is to text 911 with your location and the type of emergency. However, a meme has been circulating on social media reads: “If you need to call 911 but are scared to because of someone in the room, dial and ask for a pepperoni pizza… Share this to save a life” Here is what LAPD tweets, Remember, if you can't call - you can TEXT ! Tags: #Funny #Lesson

  • How to write single CSV file using spark?

    Apache Spark by default writes CSV file output in multiple parts-*.CSV, inside a directory. Reason is simple it creates multiple files because each partition is saved individually. Apache Spark is built for distributed processing and multiple files are expected. However, you can overcome this situation by several methods. In previous posts, we have just read the data files (flat file, json), created rdd, dataframes using spark sql, but we haven't written file back to disk or any storage system. In this Apache Spark tutorial - you will learn how to write files back to disk. Main menu: Spark Scala Tutorial For this blog, I am creating Scala Object - textfileWriter in same project - txtReader folder where we created textfileReader. Source File I am using the same source file squid.txt file (with duplicate records) which I created in previous blog. However, in practical scenario source could be anything - relational database, hdfs file system, message queue etc. Practically, It will be never the case, i.e. reading and writing same file. This is just for demo purpose. 1286536309.586 921 TCP_MISS/200 507 POST http://rcv-srv37.inplay.tubemogul.co...eiver/services - DIRECT/ application/xml 1286536309.608 829 TCP_MISS/200 507 POST http://rcv-srv37.inplay.tubemogul.co...eiver/services - DIRECT/ application/xml 1286536309.660 785 TCP_MISS/200 507 POST http://rcv-srv37.inplay.tubemogul.co...eiver/services - DIRECT/ application/xml 1286536309.684 808 TCP_MISS/200 507 POST http://rcv-srv37.inplay.tubemogul.co...eiver/services - DIRECT/ application/xml 1286536309.775 195 TCP_MISS/200 4120 GET http://i4.ytimg.com/vi/gTHZnIAzmdY/default.jpg - DIRECT/ image/jpeg 1286536309.795 215 TCP_MISS/200 5331 GET http://i2.ytimg.com/vi/-jBxVLD4fzg/default.jpg - DIRECT/ image/jpeg 1286536309.815 234 TCP_MISS/200 5261 GET http://i1.ytimg.com/vi/dCjp28ps4qY/default.jpg - DIRECT/ image/jpeg Sample Code Open jsonfileReader.scala and copy-paste the code written below. I have written separate blog to explain what are basic terminologies used in Spark like rdd, SparkContext, SQLContext, various transformations and actions etc. You can go through these for basic understanding. Spark shell, Spark context and configuration Spark RDD, Transformations and Actions However, I have explained little bit in comments above each line of code what it actually does. For list of spark functions you can refer this. You can make this code much simpler but my aim is to teach as well. Hence I have intentionally introduced header structure, SQL context, string rdd etc. However, if you are familiar with these, you can just focus on writing dataframe part highlighted in blue. package com.dataneb.spark // Each library has its significance, I have commented in below code how its being used import org.apache.spark._ import org.apache.spark.sql._ import org.apache.log4j._ import org.apache.spark.sql.types.{StructType, StructField, StringType} import org.apache.spark.sql.Row object textfileWriter { // Reducing the error level to just "ERROR" messages // It uses library org.apache.log4j._ // You can apply other logging levels like ALL, DEBUG, ERROR, INFO, FATAL, OFF etc Logger.getLogger("org").setLevel(Level.ERROR) // Defining Spark configuration to define application name and the local resources to use // It uses library org.apache.spark._ val conf = new SparkConf().setAppName("textfileWriter") conf.setMaster("local") // Using above configuration to define our SparkContext val sc = new SparkContext(conf) // Defining SQL context to run Spark SQL // It uses library org.apache.spark.sql._ val sqlContext = new SQLContext(sc) // Main function where all operations will occur def main (args:Array[String]): Unit = { // Reading the text file val squidString = sc.textFile("/Users/Rajput/Documents/testdata/squid.txt") // Defining the data-frame header structure val squidHeader = "time duration client_add result_code bytes req_method url user hierarchy_code type" // Defining schema from header which we defined above // It uses library org.apache.spark.sql.types.{StructType, StructField, StringType} val schema = StructType(squidHeader.split(" ").map(fieldName => StructField(fieldName,StringType, true))) // Converting String RDD to Row RDD for 10 attributes val rowRDD = squidString.map(_.split(" ")).map(x => Row(x(0), x(1), x(2), x(3), x(4), x(5) , x(6) , x(7) , x(8), x(9))) // Creating dataframe based on Row RDD and schema val squidDF = sqlContext.createDataFrame(rowRDD, schema) // Writing dataframe to a file with overwrite mode, header and single partition. squidDF .repartition(1) .write .mode ("overwrite") .format("com.databricks.spark.csv") .option("header", "true") .save("targetfile.csv") sc.stop() } } Run the code! Output There are several other methods to write these files. Method 1 This is what we did above. If expected dataframe size is small you can either use repartition or coalesce to create single file output as /filename.csv/part-00000. Scala> dataframe .repartition(1) .write .mode ("overwrite") .format("com.databricks.spark.csv") .option("header", "true") .save("filename.csv") Repartition(1) will shuffle the data to write everything in one particular partition thus writer cost will be high and it might take long time if file size is huge. Method 2 Coalesce will require lot of memory, if your file size is huge as you will run out of memory. Scala> dataframe .coalesce(1) .write .mode ("overwrite") .format("com.databricks.spark.csv") .option("header", "true") .save("filename.csv") Coalesce() vs repartition() Coalesce and repartition both shuffles the data to increase or decrease the partition, but repartition is more costlier operation as it performs full shuffle. For example, scala> val distData = sc.parallelize(1 to 16, 4) distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[128] at parallelize at :24 // current partition size scala> distData.partitions.size res63: Int = 4 // checking data across each partition scala> distData.mapPartitionsWithIndex((index, iter) => if (index == 0) iter else Iterator()).collect res64: Array[Int] = Array(1, 2, 3, 4) scala> distData.mapPartitionsWithIndex((index, iter) => if (index == 1) iter else Iterator()).collect res65: Array[Int] = Array(5, 6, 7, 8) scala> distData.mapPartitionsWithIndex((index, iter) => if (index == 2) iter else Iterator()).collect res66: Array[Int] = Array(9, 10, 11, 12) scala> distData.mapPartitionsWithIndex((index, iter) => if (index == 3) iter else Iterator()).collect res67: Array[Int] = Array(13, 14, 15, 16) // decreasing partitions to 2 scala> val coalData = distData.coalesce(2) coalData: org.apache.spark.rdd.RDD[Int] = CoalescedRDD[133] at coalesce at :25 // see how shuffling occurred. Instead of moving all data it just moved 2 partitions. scala> coalData.mapPartitionsWithIndex((index, iter) => if (index == 0) iter else Iterator()).collect res68: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8) scala> coalData.mapPartitionsWithIndex((index, iter) => if (index == 1) iter else Iterator()).collect res69: Array[Int] = Array(9, 10, 11, 12, 13, 14, 15, 16) repartition() Notice how repartition() will re-shuffle everything to create new partitions as compared to previous RDDs - distData and coalData. Hence repartition is more costlier operation as compared to coalesce. scala> val repartData = distData.repartition(2) repartData: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[139] at repartition at :25 // checking data across each partition scala> repartData.mapPartitionsWithIndex((index, iter) => if (index == 0) iter else Iterator()).collect res70: Array[Int] = Array(1, 3, 6, 8, 9, 11, 13, 15) scala> repartData.mapPartitionsWithIndex((index, iter) => if (index == 1) iter else Iterator()).collect res71: Array[Int] = Array(2, 4, 5, 7, 10, 12, 14, 16) Method 3 Let the file create on various partitions and later merge the files with separate Shell Script. This method will be fast depending upon your hard disk write speed. #!/bin/bash echo "ColName1, ColName2, ColName3, ... , ColNameX" > filename.csv for i in /spark/output/*.CSV ; do echo "FileNumber $i" cat $i >> filename.csv rm $i done echo "Done" Method 4 If you are using Hadoop file system to store output files. You can leverage HDFS to merge files by using getmerge utility. Input your source directory with all partition files and destination output file, it concatenates all the files in source into destination local file. You can also set -nl to add a newline character at the end of each file. Further, -skip-empty-file can be used to avoid unwanted newline characters in case of empty files. Syntax : hadoop fs -getmerge [-nl] [-skip-empty-file] hadoop fs -getmerge -nl /spark/source /spark/filename.csv hadoop fs -getmerge /spark/source/file1.csv /spark/source/file2.txt filename.csv Method 5 Use FileUtil.copyMerge() to merge all the files. import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs._ def merge(srcPath: String, dstPath: String): Unit = { val hadoopConfig = new Configuration() val hdfs = FileSystem.get(hadoopConfig) FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null) } val newData = << Your dataframe >> val outputfile = "/spark/outputs/subject" var filename = "sampleFile" var outputFileName = outputfile + "/temp_" + filename var mergedFileName = outputfile + "/merged_" + filename var mergeFindGlob = outputFileName newData.write .format("com.databricks.spark.csv") .option("header", "true") .mode("overwrite") .save(outputFileName) merge(mergeFindGlob, mergedFileName ) If you have any question, please don't forget to write in comments section below. Thank you! Next: Spark Streaming word count example Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Best Office Prank Ever, Don't Miss the End

    Have you ever seen chocolate thief in your office? This is the epic message chain when someone started to stealing chocolates from office refrigerator. #Funny #Office #Prank

  • Do NDA cadets get holidays like Sunday?

    Yes, NDA cadets get holidays and same applies for OTA and IMA cadets. A Sunday, like in IMA or OTA, is nothing less than a holiday for a cadet. Life of a cadet in National Defence Academy (Pune), Indian Military Academy (Dehradun) or Officers Training Academy (Chennai) is a busy schedule of events comprising of various training activities which literally drain out the energy of a cadet. Irrespective of the various academy, the life of the cadet remains similar. Life in academy is an everlasting experience that every military officer cherish in his life, and Sunday is special day among those. The free Sunday is the only savior for a cadet who is desperately looking for a break from training schedules and need to revitalize for coming weeks. In this blog, my endeavour is to bring few activities that take place during Sunday or any other holiday in NDA, IMA and OTA. This will help an aspiring cadet to sense and visualize the Sunday in academy. As an OTA alumni, I may recall rigorous and tough training days, full of kiosk due to various scheduled activities that take place in a day to day life and most of the activities used to overlap with each other with no spare time at all. We were so mentally and physically occupied that there was nothing else to think other than how to get some free time. So now, you can understand the importance of this GREAT SUNDAY in a cadet life. Sunday is the only break from this busy schedule of academy, a much needed time to rest physically, mentally, logistically and emotionally. This helps to relieve the body from weeklong training and fatigue and prepare it for coming week. Here comes the list of various activities that most of the cadets do on Sunday, these are: 1. Zero hair cut Yes you can call it as the most important event of the day which a cadet looks forward to and make all the effort to complete it against all his will. Even a slight remnants of hair can bring unwanted punishment to a cadet in upcoming week. 2. Sleep, Sleep & Sleep Yes you read it right, that's what most of the cadet's do. In fact the Sunday break is required for full time rest due to undergoing various weeklong tough training activities which further results in fatigue. 3. Liberty Going out in local city market during daytime is commonly known as liberty. A time to satisfy your eyes by seeing young girls & taste buds by hogging everything available. 4. Letters Yeah, yeah, bit old fashioned but we do write letters to our friends and family members. It helps to express your emotions and keep you energized for week ahead. I didn't realize it before academy, that letters are so good means of communication. 5. Canteen Yes, it’s time to fill up your stocks of emergency ration (in its literal meaning) for upcoming days. Anything that can be eaten without cooking is being kept in room. 6. Phone Most of time during weekdays one didn't get that much time to speak with your family members and friends. Therefore, THE GREAT Sunday comes for rescue. 7. Weapon Cleaning Seldom cadets’ needs to clean his weapon that could had been left dirty due to weeklong training activity. This is particularly after the week in which some outdoor exercise was conducted in previous week. I hope I've painted a clear picture of how Sunday is being spent in the academy. Feel free to ask any question about life of a cadet in a training academy.

  • What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science?

    Never thought I will spend so much time to understand these high profile terms. I was very confident that I knew theoretically everything that was necessary for me to start writing machine learning algorithms, until couple of days back when I asked myself - Does my use case fall under machine learning topic or is it artificial intelligence? Or is it predictive analytics? I began explaining myself, but couldn’t do it right. I spent several hours reading about these topics, reading blogs, thinking and ended up writing this blog to answer myself. I hope you all will also find this post helpful. Trust me most famous terminology amongst all is - "machine learning" in past couple of years. Below chart shows Google trend (interest over time) of these high profile terms - First lets understand these terminologies individually, keep below Venn diagram in mind while you read further. This will help you to distinguish various terminologies. You know what I did just now? I asked your brain to recognize patterns. Human brain automatically recognizes such patterns (basically "deep learning") because your brain is trained with "Venn diagrams" somewhere in past. By looking at diagram, your brain is able to predict few facts like Deep learning is subset of Machine learning, Artificial Intelligence is the super set, and Data Science could spread across all technologies. Right? Trust me if you show this diagram to prehistoric man, he will not understand anything. But your brain "algorithms" are trained enough with historic data to deduce and predict such facts. Isn't it? Artificial Intelligence (AI) Artificial intelligence is the broadest term. Originated in year 1950s and the oldest terminology used amongst all which we will discuss. In one liner, Artificial intelligence (AI) is a term for simulated intelligence in machines. The concept has always been the idea of building machines which are capable of thinking like humans, mimic like humans. Simplest example of AI is chess game when you play against computer, on paper program was first proposed in 1951. Recent AI example would include self-driving cars which has always been the subject of controversy. Artificial Intelligence can be split between two branches - One is labelled “applied AI” which uses these principles of simulating human thought to carry out one specific task. The other is known as “generalized AI” – which seeks to develop machine intelligences that can turn their hands to any task, much like a person. Machine Learning (ML)Machine learning is the subset of AI which originated in 1959. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence. ML gives computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. You encounter machine learning almost everyday, think about Ride sharing apps like Lyft & Uber - How do they determine the price of your ride? Google maps - How do they analyze traffic movement and predict your arrival time within seconds? Filter spam - Emails going automatically to your spam folder? Amazon Alexa, Apple SIRI, Microsoft Cortana & Google Home - How do they recognize your speech? Deep Learning (DL) Deep learning (also known as Hierarchical learning, Deep machine learning or Deep structured learning) is a subset of Machine Learning where learning method is based on data representation or feature learning. Set of methods that allows a system to automatically discover the representations needed for feature detection or classification from raw data. Examples like Mobile check deposits - Convert handwritings on checks into actual text. Facebook face recognition - Seen Facebook recognizing names while tagging? Colorization of black and white images. Object recognition In short, all three terms (AI, ML & DL) can be related as below - recall those examples Chess board, Spam emails & Object recognition (picture credit blogs.nvidia) Predictive Analytics (PA) Under predictive analytics, the goal of the problems remains very narrow where the intent is to compute a value of a particular variable at a future point of time. You can say predictive analytics is basically a sub-field of machine learning. Machine learning is more versatile and is capable to solve a wide range of problems. There are some techniques where machine learning and predictive analytics overlap like linear and logistic regression but others like decision tree, random forest etc are essentially machine learning techniques. Keep aside these regression techniques as of now, I will write detailed blogs for these techniques. How does Data Science relate to AI, ML, PA & DL? Data science is a fairly general term for processes and methods that analyze and manipulate data. It provides you ground to apply artificial intelligence, machine learning, predictive analytics and deep learning to find meaningful and appropriate information from large volumes of raw data with greater speed and efficiency. Types of Machine learning Classification of machine learning will depend upon type of task which you expect machine to perform (Supervised, Unsupervised & Reinforcement) or based on desired output i.e. data. But at the end algorithms will remain same or you can say techniques which will help you to get the desired result. Regression: This is a type of problem where we need to predict the continuous-response value like what is the value of stock. Classification: This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” like an email it's "spam" or "not spam" Clustering: This is a type of problem where we group similar things together like grouping set of tweets from Twitter. I have tried to showcase the type with below chart, I hope you will find this helpful. Please don't limit yourself with the types of regression, classifiers & clusters which I have shown below. There are number of other algorithms which are being developed and used world wide. Ask yourself which technique fits your requirement. Thank you folks!! If you have any question please mention in comments section below. #MachineLearning #ArtificialIntelligence #DeepLearning #DataScience #PredictiveAnalytics #regression #classification #cluster Next: Spark Interview Questions and Answers Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Installing Java on Oracle Linux

    Referenced from www.java.com (added few additional steps in order to make installation process more perfect) Java for Linux Platforms 1. First check if Java is already installed on your machine, Type java -version, or simply run this command on your terminal: which java 2. If Java is not present, your terminal will not understand this command and it will say command not found. 3. Now to install Java, change to the directory in which you want to install. Type: cd directory_path_name For example, to install the software in the /usr/java/ directory, Type: cd /usr/java/ 4. Download the tarball Java file from www.java.com (snippet shown above). ​ 5. Get 32 bit or 64 bit tarball file depending upon on your Linux machine configuration. 6. Move (sftp) the .tar.gz archive binary to the current directory /usr/java/. 7. Unpack the tarball and install Java tar zxvf jre-8u73-linux-i586.tar.gz In this example, it is installed in the /usr/java/jre1.8.0_73 directory. You can remove the version detail and rename the file according to your convenience. 8. Delete .tar.gz file if you want to save some disk space. 9. Setup .bashrc file. Type: vi ~/.bashrc and enter these two lines in the file; export JAVA_HOME=/usr/java/jre1.8.0_73 export PATH=$PATH:$JAVA_HOME/bin 12. Now, run source ~/.bashrc Now type command: java -version in order to see if java is successfully installed or not. If it's not running find bin directory where you unzipped Java and run: /path_to_your_Java/bin/java -version Java for RPM based Linux Platforms Become root by running su and entering the super-user password. Uninstall any earlier installations of the Java packages. rpm -e package_name Change to the directory in which you want to install. Type: cd directory_path_name For example, to install the software in the /usr/java/ directory, Type: cd /usr/java Install the package. rpm -ivh jre-8u73-linux-i586.rpm To upgrade a package, Type: rpm -Uvh jre-8u73-linux-i586.rpm Exit the root shell. No need to reboot. Delete the .rpm file if you want to save disk space. If you have any question, please write in comments section below. Thank you! #Javainstallation #OracleLinux #OEL #OL

  • Apache Kafka Overview (Windows)

    Apache Kafka is middleware solution for enterprise application. It was initiated by LinkedIn lead by Neha Narkhede and Jun Rao. Initially it was designed for monitoring and tracking system, later on it became part of one of the leading project of Apache. Why Use Kafka? Multiple producers Multiple consumers Disk based persistence Highly scalable High performance Offline messaging Messaging replay Kafka Use Cases 1. Enterprise messaging system Kafka has topic based implementation for message system. One or more consumers can consume the message and commit as per application need. Suitable for both online and offline messaging consumer system. 2. Message Store with playback capability Kafka provides the message retention on the topic. Retention of the message can be configured for the specified duration. Each message is backed up with distributed file system. Supports the storage size for 50K to 50 TB. 3. Stream processing Kafka is capable enough to process the message in real time in batch mode or in message wise. it provides the aggregation of message processing for specified time window. Download and Install Kafka Kafka requires below JRE and Zookeeper. Download and Install the below components. JRE : http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html ZooKeeper : http://zookeeper.apache.org/releases.html Kafka : http://kafka.apache.org/downloads.html Installation (on Windows) 1. JDK Setup Set the JAVA_HOME under system environment variables from the path Control Panel -> System -> Advanced system settings -> Environment Variables. Search for a PATH variable in the “System Variable” section in “Environment Variables” dialogue box you just opened. Edit the PATH variable and append “;%JAVA_HOME%\bin” To confirm the Java installation just open cmd and type “java –version”, you should be able to see version of the java you just installed 2. Zookeeper Installation: Goto your Zookeeper config directory. It would be zookeeper home directory (i.e: c:\zookeeper-3.4.10\conf) Rename file "zoo_sample.cfg" to "zoo.cfg". Open zoo.cfg in any text editor and Find & edit dataDir=/tmp/zookeeper to :\zookeeper-3.4.10\data. Add entry in System Environment Variables as we did for Java. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10 Edit System Variable named "PATH" and append ;%ZOOKEEPER_HOME%\bin; You can change the default Zookeeper port in zoo.cfg file (Default port 2181). Run Zookeeper by opening a new cmd and type zkserver. 3. Kafka Setup: Go to your Kafka config directory. For me its C:\kafka_2.10-\config. Edit file "server.properties" and Find & edit line "log.dirs=/tmp/kafka-logs" to "log.dir= C:\kafka_2.10-\kafka-logs". If your Zookeeper is running on some other machine or cluster you can edit " zookeeper.connect=localhost:2181" to your custom IP and port. Goto kafka installation folder and type below command from a command line. \bin\windows\kafka-server-start.bat .\config\server.properties. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181. Testing Kafka Creating Topics Now create a topic with name “test.topic” with replication factor 1, in case one Kafka server is running(standalone setup). If you have a cluster with more than 1 Kafka server running, you can increase the replication-factor accordingly which will increase the data availability and act like a fault-tolerant system. Open a new command prompt in the location C:\kafka_2.11-\bin\windows and type following command and hit enter. kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test.topic Creating a Producer · Open a new command prompt in the location C:\kafka_2.11-\bin\windows. · To start a producer type the following command: kafka-console-producer.bat --broker-list localhost:9092 --topic test.topic Start Consumer · Again open a new command prompt in the same location as C:\kafka_2.11-\bin\windows · Now start a consumer by typing the following command: kafka-console-consumer.bat --zookeeper localhost:2181 --topic test.topic Now you will have two command window Type anything in the producer command prompt and press Enter, and you should be able to see the message in the other consumer command prompt Some Other Useful Kafka Commands List Topics: kafka-topics.bat --list --zookeeper localhost:2181 Describe Topic kafka-topics.bat --describe --zookeeper localhost:2181 --topic [Topic Name] Read messages from beginning: kafka-console-consumer.bat --zookeeper localhost:2181 --topic [Topic Name] --from-beginning Delete Topic kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181 Kafka Architecture Kafka system has below main component, which are co-ordinated by Zookeeper. Topic Broker Producers Consumers 1. Topic Can be considered like a folder in a file system Producers published the message to a topic Message is appended to the topic. Each message is published to the topic at a particular location named as offset. Means the position of message is identified by the offset number. For each topic, the Kafka cluster maintains a partitioned log. Each partition are hosted on a single server and can be replicated across a configurable number of servers for fault tolerance. Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". Kafka provides ordering of message per partition but not across the partition. 2. Broker Core component of Kafka messaging system. Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper. Kafka cluster consists of one or more broker. Maintains the replication of partition across the cluster. 3. Producers Publishes the message to a topic(s). Messages are appended to one of the topic. It is one of the user of the Kafka cluster Kafka maintains the ordering of the message per partition but not the across the partition. 4. Consumers Subscriber of the messages from a topic One or more consumer can subscriber a topic from different partition, called consumer group. Two consumer of the same consumer group CAN NOT subscribe the messages from the same partition. Each consumer maintains the offset for subscribing partition. A consumer can re-play the subscription of message by locating the already read offset of the partition of a topic 5. Message Kafka message consists of a array of bytes, addition to this has a optional metadata is called Key. A custom key can be generated to store the message in a controlled way to the partition. Like message having a particular key is written to a specific partition.(key is hashed to get the partition number) Kafka can also write the message in batch mode, that can reduces the network round trip for each message. Batches are compressed while transportation over the network. Batch mode increases the throughput but decreases the latency, hence there is a tradeoff between latency and throughput. Visit this link for Apache Kafka Producer with Example using java If you have any question please mention in comments section below. Thank you. #KafkaOverview #ApacheKafkaWindows #KafkaZookeeperInstallation #KafkaUseCases #KafkaCommands [09/07/2019 5:49 PM CST - Reviewed by: PriSin]

  • How to convert RDD to Dataframe?

    Main menu: Spark Scala Tutorial There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code. Scala> import org.apache.spark.sql.{Row, SparkSession} Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType} Now, create a sample RDD with parallelize method. Scala> val rdd = sc.parallelize( Seq( ("One", Array(1,1,1,1,1,1,1)), ("Two", Array(2,2,2,2,2,2,2)), ("Three", Array(3,3,3,3,3,3)) ) ) Method 1 If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method. Scala> val df1 = spark.createDataFrame(rdd) Method 2 If you need header, you can add the header explicitly by calling method toDF. Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values") Method 3 If you need schema structure then you need RDD of [Row] type. Let's create a new rowsRDD for this scenario. Scala> val rowsRDD = sc.parallelize( Seq( Row("One",1,1.0), Row("Two",2,2.0), Row("Three",3,3.0), Row("Four",4,4.0), Row("Five",5,5.0) ) ) Now create the schema with the field names which you need. Scala> val schema = new StructType(). add(StructField("Label", StringType, true)). add(StructField("IntValue", IntegerType, true)). add(StructField("FloatValue", DoubleType, true)) Now create the dataframe with rowsRDD & schema and show dataframe. Scala> val df3 = spark.createDataFrame(rowsRDD, schema) Thank you folks! If you have any question please mention in comments section below. Next: Writing data files in Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

Home   |   Contact Us

©2020 by Data Nebulae