82 results found for ""

  • Write CSV/JSON data to Elasticsearch using Spark dataframes

    Elasticsearch-hadoop connector allows Spark-elasticsearch integration in Scala and Java language. Elasticsearch-hadoop library helps Apache Spark to integrate with Elasticsearch. Contents: Write JSON data to Elasticsearch using Spark dataframe Write CSV file to Elasticsearch using Spark dataframe I am using Elasticsearch version [7.3.0], Spark [2.3.1] and Scala [2.11]. Download Jar In order to execute Spark with Elasticsearch, you need to download proper version of spark-elasticsearch jar file and add it to Spark's classpath. If you are running Spark in local mode it will be added to just one machine but if you are running in cluster, you need to add it per-node. I assume you have already installed Elasticsearch, if not please follow these for installation steps (Linux | Mac users). Elasticsearch installation is very easy and it will be done in few minutes. I would encourage you all to install Kibana as well. Now, you can download complete list of hadoop library (Storm, Mapreduce, Hive and Pig as shown below) from here. I have added elasticsearch-spark-20_2.10-7.3.0.jar because I am running Elastics 7.3 version. [Tip] Make sure you are downloading correct version of jar, otherwise you will get this error during execution: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version x.x.x Adding Jar (Scala IDE) If you are using Scala IDE, just right click on project folder => go to properties => Java build path => add external jars and add the downloaded jar file. Apply and close. Adding Jar (Spark-shell) If you are using Spark-shell, just navigate to the Spark executable library where you can see all other jar files and add the downloaded jar file there. For example, Start Elasticsearch & Kibana Now, make sure Elasticsearch is running. If Elasticsearch is not running, Spark will not be able to make connection and you will get this error. org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed. To start Elasticsearch and Kibana run this command on your terminal, $ elasticsearch $ kibana Writing JSON data to Elasticsearch In all sections these three steps are mandatory, Import necessary elasticsearch spark library Configure ES nodes Configure ES port If you are running ES on AWS just add this line to your configurations - .config("spark.es.nodes.wan.only","true") JSON file multilinecolors.json sample data: [ { "color": "red", "value": "#f00" }, { "color": "green", "value": "#0f0" }, { "color": "blue", "value": "#00f" }, { "color": "cyan", "value": "#0ff" }, { "color": "magenta", "value": "#f0f" }, { "color": "yellow", "value": "#ff0" }, { "color": "black", "value": "#000" } ] package com.dataneb.spark import org.apache.spark.sql.SparkSession import org.elasticsearch.spark.sql._ object toES { def main(args: Array[String]): Unit = { // Configuration val spark = SparkSession .builder() .appName("WriteJSONToES") .master("local[*]") .config("spark.es.nodes","localhost") .config("spark.es.port","9200") .getOrCreate() // Create dataframe val colorsDF = spark.read.json("/Volumes/MYLAB/testdata/multilinecolors.json") // Write to ES with index name in lower case colorsDF.saveToEs("dataframejsonindex") } } [Tip] Make sure you are writing index name in lower case otherwise you will get error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Illegal write index name [ABCindex]. Write resources must be lowercase singular index names, with no illegal pattern characters except for multi-resource writes. Here is the Scala IDE output, You can also check the index created in Elasticsearch, go to Management => ES Index Management You can further discover the index pattern in Kibana; Writing CSV data to Elasticsearch books.csv sample data: bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count 1,Harry Potter and the Half-Blood Prince (Harry Potter #6),J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249 2,Harry Potter and the Order of the Phoenix (Harry Potter #5),J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613 3,Harry Potter and the Sorcerer's Stone (Harry Potter #1),J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390 4,Harry Potter and the Chamber of Secrets (Harry Potter #2),J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272 5,Harry Potter and the Prisoner of Azkaban (Harry Potter #3),J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964 8,Harry Potter Boxed Set Books 1-5 (Harry Potter #1-5),J.K. Rowling-Mary GrandPré,4.78,0439682584,9780439682589,eng,2690,38872,154 Everything is same except the read method (json => csv) and index name. package com.dataneb.spark import org.apache.spark.sql.SparkSession import org.elasticsearch.spark.sql._ object toES { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .appName("WriteJSONToES") .master("local[*]") .config("spark.es.nodes","localhost") .config("spark.es.port","9200") .getOrCreate() val colorsDF = spark.read.csv("/Volumes/MYLAB/testdata/books*.csv") colorsDF.saveToEs("dataframecsvindex") } } Here is the Scala IDE output, I have two csv files books1.csv and books2.csv so you are seeing 2 task ID in result. You can also check the index created in Elasticsearch, go to Management => ES Index Management You can further create the index pattern in Kibana; You can further discover the index pattern in Kibana. I haven't applied format options to read header while applying csv method in Spark program hence you are seeing header record in the index. Thank you. If you have any question please write in comments section below. Navigation Menu: Introduction to ELK Stack Installation Loading data into Elasticsearch with Logstash Create Kibana Dashboard Example Kibana GeoIP Dashboard Example

  • Quick & Easy Punjabi Palak Paneer Recipe

    My love for Palak Paneer never ends. Each time I cook Palak Paneer, I try a new variation and love to see how it turns out. Here I am going to share one of the versions which is easy and quick. Preparation time: 20 min, Serves: 3-4 Ingredients : Spinach : 1 bunch Onion : 1 medium size Tomato : 2 small Ginger : 1/2 inch or less Garlic : 3-4 cloves Cardamom : 2 (whole) Green Chili : 3 Coriander/Dhania Powder : 1 tsp Kitchen King masala powder : 1.5 tsp Cumin/Jeera seeds : 1/2 tsp Paneer/Cottage cheese cubes : 200 grams Milk : 1 cup Preparation Steps : Heat some oil in a pan. Once hot, add cardamom, diced onions, ginger, garlic and green chilies. Saute these for 4-5 minutes till onions are golden brown. Next add the tomatoes and salt. Saute till the tomatoes are soft . Next add roughly chopped spinach (baby spinach need not be chopped). Once the spinach is soft. Blend this mix to a smooth paste. In the same pan add 1 tsp of oil. Add Cumin/Jeera seeds and wait till they crackle. Add the spinach paste into the pan. Next add Dhania(coriander seeds) powder and kitchen king masala powder, salt to taste. Stir the mix for few minutes and add hot water to adjust the consistency of the gravy. Bring it to a boil. Add paneer/cottage cheese cubes and one cup milk. Let this simmer for 5-6 minutes. Serve hot! Hope you all enjoy this version of Palak Paneer :)

  • Understanding SparkContext textFile & parallelize method

    Main menu: Spark Scala Tutorial In this blog you will learn, How Spark reads text file or any other external dataset. Referencing a dataset (SparkContext's textfile), SparkContext parallelize method and spark dataset textFile method. As we read in previous post, Apache Spark has mainly three types of objects or you can say data structures (also called Spark APIs) - RDDs, dataframe and datasets. RDD was the primary API when Apache Spark was founded. RDD - Resilient Distributed Dataset Consider you have collection of 100 words and you distribute them across 10 partitions so that each partition has 10 words (more or less). Each partition has a backup so that it can be recovered in case of failure (resilient). Now, this seems very generic. In practical environment data will be distributed in a cluster with thousand of nodes (with backup nodes), and if you want to access the data you need to apply Spark actions which you will learn soon. This type of immutable distributed collection of elements is called RDD. Dataframes This has also similar distribution of elements like RDD but in this case, data is organized into a structure, like a table of relational database. Consider you have distributed collection of [row] type object, like a record distributed across thousand of nodes. You will get more clear picture when we will create dataframe, so don't worry. Datasets Dataset was introduced in late 2016. Do you remember case class which you created in "Just enough Scala for Spark"? Dataset is like the collection of strongly typed such objects, like the following case class Order which has 2 attributes orderNum (Int) and orderItem (String). It was the introduction, so even if you don't understand, thats's fine. You will get more clear picture with practical examples. Question is.. Which data structure you should implement? It totally depends on the business use case which data structure you should implement. For instance, Datasets and RDDs are basically used for unstructured data like streams of media texts, when schema and columnar format of data is not mandatory requirement (like accessing data by column name and any other tabular attributes). Also, RRDs are often used when you want full control over physical distribution of data over thousands of nodes in a cluster. Similarly, Dataframes are often used with Spark SQL when you have structured data and you need schema and columnar format of data maintained throughout the process. Datasets are also used in such scenario where you have unstructured or semi-structured data and you want to run Spark SQL. That being said, we have mainly following methods to load data in Spark. SparkContext's textfile method which results into RDD. SparkContext's parallelize collection, which also results into RDD. Spark read textFile method which results into Dataset. SQLContext read json which results into Dataframe. Spark session read json which results into Dataframe. You can also create these with parquet files, read parquet method. Similarly there are other methods, it's difficult to list all of them but these examples will give you a picture how you can create them. 1. SparkContext textfile [spark.rdd family] Text file RDDs can be created using SparkContext's textfile method. Define SparkConf and SparkContext like we did in earlier post and use SparkContext to read the textfile. I have created a sample text file with text data regarding - Where is Mount Everest? Got the answer from Wikipedia. scala> val dataFile = sc.textFile("/Users/Rajput/Documents/testdata/MountEverest.txt") dataFile: org.apache.spark.rdd.RDD[String] = /Users/Rajput/Documents/testdata/MountEverest.txt MapPartitionsRDD[1] at textFile at :27 File has 9 lines and you can see the first line in above screenshot. Further, you can count the number of words in the file by splitting the text (with space character) and applying count() action. You will learn about transformations like flatMap and action count soon, so don't worry. scala> dataFile.flatMap(line => line.split(" ")).count() res4: Long = 544 Right now the motive is to tell - how you read text file with textFile member of SparkContext family. The resultant is an RDD. Important notes: We can use wildcards characters to read multiple files together ("/file/path/*.txt). It can read compressed files (*.gz), files from HDFS, Amazon S3, Hbase etc. 2. SparkContext parallelize collection [spark.rdd family] This method is used to distribute the collection of same type of elements (in an array, list etc). This distributed dataset can be operated in parallel. // Parallelizing list of strings scala> val distData = sc.parallelize(List("apple","orange","banana","grapes")) distData: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[3] at parallelize at :27 // 4 total elements scala> distData.count() res5: Long = 4 or like these, scala> sc.parallelize(Array("Hello Dataneb! How are you?")) res3: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :25 scala> sc.parallelize(Array("Hello","Spark","Dataneb","Apache")) res4: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at parallelize at :25 scala> sc.parallelize(List(1 to 10)) res6: org.apache.spark.rdd.RDD[scala.collection.immutable.Range.Inclusive] = ParallelCollectionRDD[2] at parallelize at :25 scala> sc.parallelize(1 to 10) res7: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at parallelize at :25 scala> sc.parallelize(1 to 10 by 2) res8: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at :25 You can also see the size of partitions, scala> res8.partitions.size res13: Int = 4 3. Read text file to create Dataset [spark.sql family] You can create dataset from a text file or any other file system like HDFS. Here, you can use default spark session which gets created when you start spark-shell. // creating dataset scala> val distDataset = spark.read.textFile("/Users/Rajput/Documents/testdata/MountEverest.txt") distDataset: org.apache.spark.sql.Dataset[String] = [value: string] // 9 lines scala> distDataset.count() res0: Long = 9 // 544 total word count scala> distDataset.flatMap(line => line.split(" ")).count() res2: Long = 544 // 5 Lines with Everest scala> distDataset.filter(line => line.contains("Everest")).count() res3: Long = 5 Here is the shell screenshot; 4. SQLContext read json to create Dataframe [spark.sql family] You can create dataframes with SQLContext. SQLContext is a type of class in Spark which is like entry point for Spark SQL. // you need to import sql library to create SQLContext scala> import org.apache.spark.sql._ import org.apache.spark.sql._ // telling Spark to use same configuration as Spark context scala> val sqlContext = new SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@40eb85e9 My json file looks like this, [ { "color": "red", "value": "#f00" }, { "color": "green", "value": "#0f0" }, { "color": "blue", "value": "#00f" }, { "color": "cyan", "value": "#0ff" }, { "color": "magenta", "value": "#f0f" }, { "color": "yellow", "value": "#ff0" }, { "color": "black", "value": "#000" } ] // creating dataframe scala> val df = sqlContext.read.json("/Volumes/MYLAB/testdata/multilinecolors.json") df: org.apache.spark.sql.DataFrame = [color: string, value: string] // printing schema of dataframe, like a table scala> df.printSchema() root |-- color: string (nullable = true) |-- value: string (nullable = true) // storing this dataframe into temp table scala> df.registerTempTable("tmpTable") // retrieving data scala> sqlContext.sql("select * from tmpTable").show() +-------+-----+ | color|value| +-------+-----+ | red| #f00| | green| #0f0| | blue| #00f| | cyan| #0ff| |magenta| #f0f| | yellow| #ff0| | black| #000| +-------+-----+ 5. Spark Session to create dataframe [spark.sql family] You can also create dataframe from default spark session which is created when you start the spark-shell. Refer spark-shell blog. scala> spark res14: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@6c9fe061 scala> spark.read.json("/Volumes/MYLAB/testdata/multilinecolors.json") res16: org.apache.spark.sql.DataFrame = [color: string, value: string] scala> res16.show() +-------+-----+ | color|value| +-------+-----+ | red| #f00| | green| #0f0| | blue| #00f| | cyan| #0ff| |magenta| #f0f| | yellow| #ff0| | black| #000| +-------+-----+ scala> res16.printSchema() root |-- color: string (nullable = true) |-- value: string (nullable = true) scala> res16.select("color").show() +-------+ | color| +-------+ | red| | green| | blue| | cyan| |magenta| | yellow| | black| +-------+ scala> res16.filter($"color"==="blue").show() +-----+-----+ |color|value| +-----+-----+ | blue| #00f| +-----+-----+ You can also convert dataframe back to JSON like this, scala> res16.toJSON.show(false) +----------------------------------+ |value | +----------------------------------+ |{"color":"red","value":"#f00"} | |{"color":"green","value":"#0f0"} | |{"color":"blue","value":"#00f"} | |{"color":"cyan","value":"#0ff"} | |{"color":"magenta","value":"#f0f"}| |{"color":"yellow","value":"#ff0"} | |{"color":"black","value":"#000"} | +----------------------------------+ You can also create dataframes from parquet, text files etc. You will learn this soon. That's all guys! If you have any question or suggestion please write in comments section below. Thank you folks. Next: Spark Transformations Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Every Gardener Must Know These Homemade Organic Pesticide Remedies

    Organic Homemade Pesticide: Insects like Cricket, Spiders, Snail, Aphids and others can cause serious damage to your garden and invite several diseases. Personally, I don't recommend chemical pesticides to get rid of these as they can make fruits and vegetables unsafe for consumption plus they are not safe for environment. However, there are many homemade remedies for you to stop these pests. Using Neem Mix 30 milliliters of Neem oil with 1 teaspoon mild soap. Mix the Neem and soap into 1 liters of warm water. Pour the pesticide into a squirt bottle and spray on the affected areas. Using Onion, Chilies & Garlic Blend 100 grams of red hot peppers with 50 grams of garlic cloves & 50 grams of onions to form thick paste. Mix the paste into 1 liter of warm water. Pour the solution into a container and leave it for 24 hours in a warm spot. Filter the solution through a strainer to remove solid particles. That's it, filtered solution is your pesticide. Pour your pesticide into a squirt bottle and spray on the affected plants. Using Tobacco Mix half cup of tobacco into 1 liters of water. Keep the mixture out in sun for 24 hours. Check the color of the mixture if it's similar to color of light tea. Add 2 tablespoons of mild liquid dish soap to the solution and mix thoroughly. Pour your liquid into a squirt bottle and spray on the affected plants. Using Orange Peels Boil 2 Orange peels in 1 liters of water Keep the solution in a warm spot for 24 hours. Pour your liquid into a squirt bottle after filtering the peels. Add a few drops of Castile soap and mix the solution thoroughly. Pour the pesticide into a squirt bottle and spray on the affected areas. Using Egg Shell Egg shells can not be used to make pesticide but it can protect your plant from pests. Further, composed of calcium carbonate, eggshells are an excellent way to introduce this mineral into the soil. Microwave waste egg shells for couple of minutes to kill bacteria. You can dry it in a sunny spot as well for 3-4 days, however microwave is a faster option. Put them in a plastic bag and crush it to make fine particles. Spread the crushed shell around your plant, it will block pests to attack your plant root and serve as good source of Calcium. You can blend egg shells as well and use it as a fertilizer. Eggshells will reduce the acidity of your soil and help to aerate it. #Garden #Pesticides #Organic #Homemade #Remedies

  • How to pull data from OKTA API example

    OKTA has various rest APIs (refer this) from where you can pull the data and play around according to your business requirement. As OKTA stores only 90 days of records so in many cases you might need to store the data in external databases and then perform your data analysis. In order to pull the data from OKTA I considered writing a shell script, probably because this looked very straight forward to me. But there are other methods as well which you can consider if you have wide project timeline. Lets see how this can be done with a shell script. Step 1: Go through the API reference documents and filters which OKTA has provided online. It's seriously very well documented and that would help you in case you want to tweak this script. Step 2: Get API access token from OKTA admin and validate if token is working properly or not with Postman client. Refer this. Step 3: Once you have the API access token and basic understanding of API filters you will be able to tweak the script according to your need. Step 4: Below is the complete shell program and brief explanation what each step is doing. # Define your environment variables - organization, domain and api_token. These will be used to construct URL in further steps. # If you want you can hide your API token, probably by reading token from a parameter file instead hard coding it. # Start ORG=company_name DOM=okta API_TOKEN=********************* # Initialize variables with some default values. # Change your destination path wherever you want to write the data. # Val is basically the pagination limit and PAT/REP_PAT is basically the pattern and replace_pattern string which I used to format the JSON file in correct format. Date_range will be used to pull the data based on dates which user inputs. VAL=1000 DEST_FILE=/var/spark/data i=1 PAT= REP_PAT= DATE_RANGE=2014-02-01 # Choose the API for which you need the data (events, logs or users), you can modify the code if you want to export any other api data. echo "Enter the name of API - events, logs, users. " read GID # Enter the date range to pull data echo "Enter the date in format yyyy-mm-dd" read DATE_RANGE date_func() { echo "Enter the date in format yyyy-mm-dd" read DATE_RANGE } # Check if entered date is in correct format if [ ${#DATE_RANGE} -ne 10 ]; then echo "Invalid date!! Enter date again.."; date_func else echo "Valid date!" fi # Construct the URL based on all the variables defined earlier URL=htt ps://$ORG.$DOM.com/api/v1/$GID?limit=$VAL # Case to choose API name entered by user, 4 to 10 are empty routes if you want to add new APIs case $GID in events) echo "events API selected" rm -f /var/spark/data/events.json* URL=htt ps://$ORG.$DOM.com/api/v1/$GID?lastUpdated%20gt%20%22"$DATE_RANGE"T00:00:00.000Z%22\&$VAL PAT=}]},{\"eventId\": REP_PAT=}]}'\n'{\"eventId\": sleep 1;; logs) echo "logs API selected" rm -f /var/spark/data/logs.json* URL=htt ps://$ORG.$DOM.com/api/v1/$GID?lastUpdated%20gt%20%22"$DATE_RANGE"T00:00:00.000Z%22\&$VAL PAT=}]},{\"actor\": REP_PAT=}]}'\n'{\"actor\": sleep 1;; users) echo "users API selected" PAT=}}},{\"id\": REP_PAT=}}}'\n'{\"id\": rm -f /var/spark/data/users.json* URL=htt ps://$ORG.$DOM.com/api/v1/$GID?filter=status%20eq%20%22STAGED%22%20or%20status%20eq%20%22PROVISIONED%22%20or%20status%20eq%20%22ACTIVE%22%20or%20status%20eq%20%22RECOVERY%22%20or%20status%20eq%20%22PASSWORD_EXPIRED%22%20or%20status%20eq%20%22LOCKED_OUT%22%20or%20status%20eq%20%22DEPROVISIONED%22\&$VAL echo $URL sleep 1;; 4) echo "four" ;; 5) echo "five" ;; 6) echo "six" ;; 7) echo "seven" ;; 8) echo "eight" ;; 9) echo "nine" ;; 10) echo "ten" ;; *) echo "INVALID INPUT!" ;; esac # Deleting temporary files before running the script rm -f itemp.txt rm -f temp.txt rm -f temp1.txt # Creating NEXT variable to handle pagination curl -i -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: SSWS $API_TOKEN" "$URL" > itemp.txt NEXT=`grep -i 'rel="next"' itemp.txt | awk -F"<" '{print$2}' | awk -F">" '{print$1}'` tail -1 itemp.txt > temp.txt # Validating if URL is correctly defined echo $URL # Iterating the loop of pagination with NEXT variable until it's null while [ ${#NEXT} -ne 0 ] do echo "this command is executed till NEXT is null, current value of NEXT is $NEXT" curl -i -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: SSWS $API_TOKEN" "$NEXT" > itemp.txt tail -1 itemp.txt >> temp.txt NEXT=`grep -i 'rel="next"' itemp.txt | awk -F"<" '{print$2}' | awk -F">" '{print$1}'` echo "number of loop = $i, for NEXT reference : $NEXT" (( i++ )) cat temp.txt | cut -c 2- | rev | cut -c 2- | rev > temp1.txt rm -f temp.txt # Formatting the output to create single line JSON records echo "PATTERN = $PAT" echo "REP_PATTERN = $REP_PAT" sed -i "s/$PAT/$REP_PAT/g" temp1.txt mv temp1.txt /var/spark/data/$GID.json_`date +"%Y%m%d_%H%M%S"` sleep 1 done # END See also - How to setup Postman client If you have any question please write in comments section below. Thank you!

  • Analyzing Twitter Data - Twitter sentiment analysis using Spark streaming

    Analyzing Twitter Data - Twitter sentiment analysis using Spark streaming. Twitter Spark Streaming - We will be analyzing Twitter data and we will be doing Twitter sentiment analysis using Spark streaming. You can do this in any programming language Python, Scala, Java or R. Main menu: Spark Scala Tutorial Spark streaming is very useful in analyzing real time data from IoT technologies which could be your smart watch, Google Home, Amazon Alexa, Fitbit, GPS, home security system, smart cameras or any other device which communicates with internet. Social accounts like Facebook, Twitter, Instagram etc generate enormous amount of data every minute. Below trend shows interest over time for three of these smart technologies over past 5 years. In this example we are going to stream Twitter API tweets in real time with OAuth authentication and filter the hashtags which are most famous among them. Prerequisite Download and install Apache Spark and Scala IDE (Windows | Mac) Create Twitter sample application and obtain your client secret, client secret key, access token and access token secret. Refer this to know how to get Twitter development account and api access keys. Authentication file setup Create a text file twitter.txt with Twitter OAuth details and place it anywhere on your local directory system (remember the path). File content should look like this. Basically it has two fields separated with single space - first field contains OAuth headers name and second column contains api keys. Make sure there is no extra space anywhere in your file else you will get authentication errors. There is a new line character at the end (i.e. hit enter after 4th line). You can see empty 5th line in below screenshot. Write the code! Now create a Scala project in Eclipse IDE (see how to create Scala project), refer the following code that prints out live tweets as they stream using Spark Streaming. I have written separate blog to explain what are basic terminologies used in Spark like RDD, SparkContext, SQLContext, various transformations and actions etc. You can go through this for basic understanding. However, I have explained little bit in comments above each line of code what it actually does. For list of spark functions you can refer this. // Our package package com.dataneb.spark // Twitter libraries used to run spark streaming import twitter4j._ import twitter4j.auth.Authorization import twitter4j.auth.OAuthAuthorization import twitter4j.conf.ConfigurationBuilder import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.streaming._ import org.apache.spark.streaming.twitter._ import org.apache.spark.streaming.StreamingContext._ import scala.io.Source import org.apache.spark.internal.Logging import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.dstream._ import org.apache.spark.streaming.receiver.Receiver /** Listens to a stream of Tweets and keeps track of the most popular * hashtags over a 5 minute window. */ object PopularHashtags { /** Makes sure only ERROR messages get logged to avoid log spam. */ def setupLogging() = { import org.apache.log4j.{Level, Logger} val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.ERROR) } /** Configures Twitter service credentials using twiter.txt in the main workspace directory. Use the path where you saved the authentication file */ def setupTwitter() = { import scala.io.Source for (line <- Source.fromFile("/Volumes/twitter.txt").getLines) { val fields = line.split(" ") if (fields.length == 2) { System.setProperty("twitter4j.oauth." + fields(0), fields(1)) } } } // Main function where the action happens def main(args: Array[String]) { // Configure Twitter credentials using twitter.txt setupTwitter() // Set up a Spark streaming context named "PopularHashtags" that runs locally using // all CPU cores and one-second batches of data val ssc = new StreamingContext("local[*]", "PopularHashtags", Seconds(1)) // Get rid of log spam (should be called after the context is set up) setupLogging() // Create a DStream from Twitter using our streaming context val tweets = TwitterUtils.createStream(ssc, None) // Now extract the text of each status update into DStreams using map() val statuses = tweets.map(status => status.getText()) // Blow out each word into a new DStream val tweetwords = statuses.flatMap(tweetText => tweetText.split(" ")) // Now eliminate anything that's not a hashtag val hashtags = tweetwords.filter(word => word.startsWith("#")) // Map each hashtag to a key/value pair of (hashtag, 1) so we can count them by adding up the values val hashtagKeyValues = hashtags.map(hashtag => (hashtag, 1)) // Now count them up over a 5 minute window sliding every one second val hashtagCounts = hashtagKeyValues.reduceByKeyAndWindow( (x,y) => x + y, (x,y) => x - y, Seconds(300), Seconds(1)) // Sort the results by the count values val sortedResults = hashtagCounts.transform(rdd => rdd.sortBy(x => x._2, false)) // Print the top 10 sortedResults.print // Set a checkpoint directory, and kick it all off // I can watch this all day! ssc.checkpoint("/Volumes/Macintosh HD/Users/Rajput/Documents/checkpoint/") ssc.start() ssc.awaitTermination() } } Run it! See the result! I actually went to my Twitter account to see top result tweet #MTVHottest and it was trending, see the snapshot below. You might face error if You are not using proper version of Scala and Spark Twitter libraries. I have listed them below for your reference. You can download these libraries from these two links Link1 & Link2. Scala version 2.11.11 Spark version 2.3.2 twitter4j-core-4.0.6.jar twitter4j-stream-4.0.6.jar spark-streaming-twitter_2.11-2.1.1.jar Thank you folks. I hope you are enjoying these blogs, if you have any doubt please mention in comments section below. #AnalyzingTwitterData #SparkStreamingExample #Scala #TwitterSparkStreaming #SparkStreaming Next: Sample Big Data Architecture with Apache Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • 6 Budget-Friendly Ways to Prepare for Your Pregnancy (checklist)

    Every pregnancy is different, and that is true even in the same person. Your first pregnancy might have been plagued with morning sickness, high blood pressure and lower back pain, while in your second pregnancy you hardly felt a thing. That can make pregnancy preparation tricky — not knowing what to expect can be hard on your mood and your finances. Many pregnant women enjoy feeling their new child growing and developing, but in those times of discomfort, it’s important to have a plan to manage physical and mental stress. Here are a few budget-friendly tips to help you with sound and solid pregnancy prep. 1. Before and after clothes When you think about buying maternity clothes do you just cringe at the cost, knowing you’ll only have to wear this size for a short period of time? There are actually ways to cut costs when it comes to pregnancy wear. First, consider buying a belly band so you can transform the pants you currently wear into pregnancy and postpartum pants. Second, look into comfortable nursing pajamas (you can find a pair for $33.99) that you can fit into now and after the baby comes. The more cozy and flowy they are, the more comfortable you’ll be during some of those long, late night nursing marathons. 2. Amazon’s “Subscribe and Save” You should have bought stock in antacids with the kind of heartburn you are experiencing. Now it’s 3 am and you can’t sleep and you are out of Tums. You can save time and money by subscribing to items you use a lot. Not only will these be automatically delivered to your home so you never have to experience late night heartburn unaided again, but the cost per item is often reduced when you subscribe. You can do this with other items, like foods you have been craving, shea butter to help reduce stretch marks or hemorrhoid cream for sore bottoms. 3. Putting together a nursery Putting together a warm and comfortable nursery is important for mother and baby. Since you and your newborn will spend a lot of time there, you want it to be as nurturing as possible. And while you might be tempted to go overboard with the decor, it’s important to focus only on the basics so you can stay within budget. Also, while you might be tempted to do everything yourself, don’t tackle any projects that you feel are out of your wheelhouse. Fortunately, in Minneapolis, you can hire a handyman for an assortment of small jobs for an average of $403 per project, depending on the size of the project. And although that might sound like a lot of money, you’ll rest assured knowing that the tasks were completed by a professional. 4. Children’s consignment stores While primarily an ideal spot to find good deals on gently used clothes, toys, furniture and bedding, you can also find steep discounts on used maternity and postpartum accessories. You can find breast pumps and parts, breastfeeding pillows and other nursing items. And the e-commerce boom has also helped increase access to quality used pre- and postpartum clothes. You can even rent high end used maternity and nursing clothes. Browse online and have them delivered right to your door. 5. Explore Coupons and Groupons The big box retailers love a pregnant woman — families are very profitable to stores that sell food, clothing, home goods and furniture. They will be looking to entice you into the store by offering coupons and discounts on maternity and baby items. Take advantage of these discounts! And don’t just look there; websites that offers discounts, like Groupon, also often have a section with items to help you plan and prepare for a baby. And don’t forget about stores like Sam’s Club and Costco. After you pay their membership fee, you get access to bulk and wholesale items with steep discounts. In fact, consider adding a membership to one of those stores to your baby registry. 6. Facebook groups for new moms Social media is a place where we can build community. Of course, anyone watching the news knows social media has a dark side, but there are also opportunities to find and make real connections. Look for mom groups out there in your area. There are often breastfeeding groups, buy-sale-trade groups, baby-wearing groups and other mom-themed groups in many cities. More importantly than being able to purchase used items, you are able to ask questions, get advice and provide — and receive— support. Pregnancy is going to be a time of discovery, even for those on their second child or beyond. Give yourself space to breathe easier by setting a budget and staying within that budget. And don’t forget to lean on your community as much as you can for support.

  • Blow | Ed Sheeran | Bruno Mars | Chris Stapleton | Drums Sheet Music

    If you've just started to play drums and you’re looking for easy drum cover for beginners keep reading these blog, here I will list my favorite simple covers for newbies in drumming. Follow me on Youtube Beats Breakdown In this section, I will go over all the different beats and fills used in the song, So, lets start with the first groove which is the core of this song: Its a basic rock groove with open hi-hat being played as quarter notes and snare on 3rd count of it. Bass drum is being played on 1 and "and" of 2. The song utilizes different variations of this beat as the song progresses. This variation just adds a snare at the "4" along with an open hi-hat. The second variation add a bit of ghost notes on snares. Ghost notes have been added on the "and" of 1 and "and" of 4. You can play ruffs or ghosted eighth notes on snare drum depending on your preference. Second beat is again one of the most popular rock beats. The beat employs open hi-hat on quarter notes combined with snare on beat 1 and 3. Bass drum is played on beat 2 and 4. Here the hi-hat has been replaced with crash on 1 and 3. This beat has been played at the end of the song. This utilizes half-notes, with bass drum played with open hi-hat on beat 1 and snare played with open hi-hat on beat 2. Rolls Breakdown Now lets go through all the different rolls used in the song: This is the most frequently used roll in the song. The roll starts at the "and" of 4 of the beat with a ghost note on the snare. The roll is played in the form eighth notes following snare-tom-tom pattern. The image on the left shows the roll as it looks while being played with the beat. This role has been played on the snare drum with 16th notes played on "and" of 1 till 2, ending up with a series of eighth notes on snare. This one is starts at the 4 of previous beat and played totally in eighth notes. It is played in terms of triplets i.e. bass-snare-snare with an open hi-hat played with the bass drum. This one is just a combination of "Roll 1" and "Roll 2" as described above played in the sequence "Roll 2" followed by "Roll 1". The only difference is that the last three eighth notes of the "Roll 2" has been played on toms. This is the toughest roll in the song, played at the end of the guitar solo. In terms of sticking it is just eighth note triplets being played. The speed at which it has been played is what makes it tough. The song ends with this. This one is nothing but the "roll 1" being played four times as the tempo of the song drops. Full sheet music I will be posting a video of myself playing this song on my YouTube channel. Do subscribe to the channel as well as this blog for more drum tutorials.

  • How to write your first blog at Data Nebulae?

    Guest blogging at Data Nebulae is very simple. Please read these instructions carefully before you start. Step 1: Sign Up Once you sign up, you will automatically receive writers privilege within 24 hours. If you have additional question please email us. Step 2: Start Writing That's all! You are ready to create posts. But before you start please read blogging guidelines carefully. Blogging Guidelines These rules are meant to keep quality blogging at Data Nebulae. Blog Uniqueness Blogs should be unique. Dataneb don't accept syndicated/unoriginal posts, research papers, duplicate posts, copying others content/articles is strictly prohibited. NOTE: Violation to this guidelines will result into direct loss of writers privilege. Blog Length Blogs should have minimum 3000 characters, there is no upper limit. You will find total number of characters on the top left corner of editor while drafting blogs. Blogs not fulfilling this criteria will be automatically moved to draft status. Image Requirement You can insert images (but it should not be a copyright image). Or, you can leave it to us. One of our moderators will handle image requirement. Back-links Back-links are allowed (maximum 5 & sometimes more) as far as intention is clear. Make sure you are not linking any blacklisted websites. Miscellaneous Moderators has authority to add keywords, modify texts, images etc so that your blog can get higher Google ranking. This will help your blog to get more views. You can delete your post anytime, but Data Nebulae has full rights to re-publish that content again. Example Editor Before you start please refer this post for your reference. See how paragraph is written, header size, bullet points, image alignment, divider line, hashtags etc are used. This is how your blog editor would look like: Wait! There is a Easier Way to Publish Your Blog We understand you are a beginner and you don't want to publish your blog without review. Don't worry! Just draft the blog and save it. Email us when your blog is ready to publish and one of our moderators will review/publish it for you. If you are just a member and don't want to become a writer. You can also write your post in a word document and email us for submission. What's next? Share your blog post on Facebook, Twitter etc to get more views, earn badges and invite others. Sharing blogs on social media is the easiest and fastest way to earn views. We value your words, please don't hurt others feelings while commenting on blog posts and maintain quality environment at Data Nebulae. Email us if you have any query. Good Luck!

  • Funny Short Math Jokes and Puns, Math is Fun!

    A mathematical joke is a form of humor which relies on aspects of mathematics or a stereotype of mathematicians to derive humor. The humor may come from a pun, or from a double meaning of a mathematical term, or from a lay person's misunderstanding of a mathematical concept. Instead of good-bye we say Calc-U-later Why should you not mix alcohol and calculus? Because you should never drink and derive. Write the expression for the volume of a thick crust pizza with height "a" and radius "z". The formula for volume is π·(radius)**2·(height). In this case, pi·z·z·a. How do you make seven even? Just remove the “s.” Q: What is a proof? A: One-half percent of alcohol. Q: What is gray and huge and has integer coefficients? A: An elephantine equation. Q: Why do truncated Maclaurin series fit the original function so well? A: Because they are “Taylor” made. Q: What is gray and huge and has integer coefficients? A: An elephantine equation. Q: What’s a polar bear? A: A rectangular bear after a coordinate transform. Q: What do you get if you cross a mosquito with a mountain climber? A: You can’t cross a vector with a scalar. Theorem. 3=4. Proof. Suppose a + b = c This can also be written as: 4a − 3a + 4b − 3b = 4c − 3c After reorganizing: 4a + 4b − 4c = 3a + 3b − 3c Take the constants out of the brackets: 4(a + b − c) = 3(a + b − c) Remove the same term left and right: 4=3 A mathematician and an engineer are on a desert island. They find two palm trees with one coconut each. The engineer shinnies up one tree, gets the coconut, and eats it. The mathematician shinnies up the other tree, gets the coconut, climbs the other tree and puts it there. “Now we’ve reduced it to a problem we know how to solve.” There are a mathematician and a physicist and a burning building with people inside. There are a fire hydrant and a hose on the sidewalk. The physicist has to put the fire out…so, he attaches the hose to the hydrant, puts the fire out, and saves the house and the family. Then they put the people back in the house, set it on fire, and ask the mathematician to solve the problem. So, he takes the hose off the hydrant and lays it on the sidewalk. “Now I’ve reduced it to a previously solved problem” and walks away. Three men are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. One of the three men says, “I’ve got an idea. We can call for help in this canyon and the echo will carry our voices far.” So he leans over the basket and yells out, “Helloooooo! Where are we?” (They hear the echo several times.) Fifteen minutes later, they hear this echoing voice: “Hellooooo! You’re lost!!” One of the men says, “That must have been a mathematician.” Puzzled, one of the other men asks, “Why do you say that?” The reply: “For three reasons: (1) He took a long time to answer, (2) he was absolutely correct, and (3) his answer was absolutely useless.” Infinitely many mathematicians walk into a bar. The first says, "I'll have a beer." The second says, "I'll have half a beer." The third says, "I'll have a quarter of a beer." Before anyone else can speak, the barman fills up exactly two glasses of beer and serves them. "Come on, now,” he says to the group, “You guys have got to learn your limits.” Scientists caught a physicist and a mathematician and locked them in separate rooms so both could not interact with each other. They started studying their behavior. The two were assigned a task to remove a hammered nail from inside the wall. The only tools they had were a hammer and a nail-drawer. After some muscular effort, both solved the tasks similarly by using the nail-drawer. Then there was a second task, to remove the nail that was barely touching the wall with its sharp end. The physicist simply took the nail with his hand. The mathematician hammered the nail inside the wall with full force and proudly announced: the problem has been reduced to the previous one! A mathematician organizes a raffle in which the prize is an infinite amount of money paid over an infinite amount of time. Of course, with the promise of such a prize, his tickets sell like hot cake. When the winning ticket is drawn, and the jubilant winner comes to claim his prize, the mathematician explains the mode of payment: "1 dollar now, 1/2 dollar next week, 1/3 dollar the week after that..." Sherlock Holmes and Watson travel on a balloon. They were hidden in clouds, so they didn’t know which country they flew above. Finally they saw a guy below between clouds, so they asked. “Hey, you know where we are?” “Yes” “Where?” “In a balloon”. And the guy was hidden by clouds again. Watson:”Goddamn, what a stupid idiot!” Holmes:”No my friend, he’s a mathematician”. Watson:”How can you know that, Holmes?” Holmes:”Elementary, my dear Watson. He responded with an absolutely correct and absolutely useless answer”. My girlfriend is the square root of -100. She’s a perfect 10, but purely imaginary. How do mathematicians scold their children? "If I've told you n times, I've told you n+1 times..." What’s the best way to woo a math teacher? Use acute angle. What do you call a number that can't keep still? A roamin' numeral. Take a positive integer N. No wait, N is too big; take a positive integer k. A farmer counted 196 cows in 
the field. But when he rounded them up, he had 200. Why should you never argue with decimals? Because decimals always have a point. When someone once asked Professor Eilenberg if he could eat Chinese food with three chopsticks, he answered, "Of course," according to Professor Morgan. How are you going to do it? I'll take the three chopsticks, I'll put one of them aside on the table, and I'll use the other two. A statistics professor is going through security at the airport when they discover a bomb in his carry-on. The TSA officer is livid. "I don't understand why you'd want to kill so many innocent people!" The professor laughs and explains that he never wanted to blow up the plane; in fact, he was trying to save them all. "So then why did you bring a bomb?!" The professor explains that the probability of a bomb being on an airplane is 1/1000, which is quite high if you think about it, and statistically relevant enough to prevent him from being able to fly stress-free. "So what does that have to do with you packing a bomb?" the TSA officer wants to know, so the professor explains. "You see, if there's 1/1000 probability of a bomb being on my plane, the chance that there are two bombs is 1/1000000. So if I bring a bomb, the chance there is another bomb is only 1/1000000, so we are all much safer." The great probabilist Mark Kac (1914-1984) once gave a lecture at Caltech, with Feynman in the audience. When Kac finished, Feynman stood up and loudly proclaimed, "If all mathematics disappeared, it would set physics back precisely one week." To that outrageous comment, Kac shot back with that yes, he knew of that week; it was "Precisely the week in which God created the world." An experimental physicist meets a mathematician in a bar and they start talking. The physicict asks, "What kind of math do you do?" to which the mathematician replies, "Knot theory." The physicist says, "Me neither!" A poet, a priest, and a mathematician are discussing whether it's better to have a wife or a mistress. The poet argues that it's better to have a mistress because love should be free and spontaneous. The priest argues that it's better to have a wife because love should be sanctified by God. The mathematician says, "I think it's better to have both. That way, when each of them thinks you're with the other, you can do some mathematics." Three mathematicians walk into a bar. Bartender asks:”Will all of you guys have beer?” The first mathematician: “I don’t know”. The second mathematician: “I don’t know”. The third one: ”Yes”. A mathematician is attending a conference in another country and is sleeping at a hotel. Suddenly, there is a fire alarm and he rushes out in panic. He also notices some smoke coming from one end of the corridor. As he is running, he spots a fire extinguisher. “Ah!”, he exclaims, “A solution exists!” and comes back to his room and sleeps peacefully. Two statisticians go to hunt a bear. After roaming the woods for a while, they spot a lone grizzly. The first statistician takes aim and shoots, but it hits three feet in front of the bear. The second one shoots next, and it hits three feet behind the bear. They both agree that they have shot the bear and go to retrieve it.. Parallel lines have so much in common. It’s a shame they’ll never meet. I just saw my math teacher with a piece of graph paper. I think he must be plotting something. Are monsters good at math? No, unless you Count Dracula. My girlfriend is the square root of -100. She's a perfect 10, but purely imaginary. Q: Why is a math book depressed? A: Because it has so many problems. How do you stay warm in an empty room? Go into the corner where it is always 90 degrees. There are three kinds of people in the world: those who can count and those who can't. Q: Why did I divide sin by tan? A: Just cos. Q: Where's the only place you can buy 64 watermelons and nobody wonders why? A: In an elementary school math class. 60 out of 50 people have trouble with fractions. But why did 7 eat 9? Because you’re supposed to eat 3 squared meals a day. Q: Why is the obtuse triangle depressed? A: Because it is never right. Q: Why did the 30-60-90 degree triangle marry the 45-45-90 degree triangle? A: Because they were right for each other. Q: Why didn't the Romans find algebra very challenging? A: Because they always knew X was 10. Two statisticians went out hunting and they found a deer. The first one overshoots by 5 meters. The second one undershoots by 5 meters. They both hug each other and shout out “We Got It!” An astronomer, a physicist and a mathematician are on a train traveling from England to Scotland. It is the first time for each of them. Some time after the train crosses the border, the three of them notice a sheep in a field. “Amazing!” says the astronomer. “All the sheep in Scotland are black!”. “No, no” responds the physicist. “Some sheep in Scotland are black!” The mathematician closes his eyes pityingly, and intones: “In Scotland, there is at least one field, containing at least one sheep, at least one side of which is black.” An engineer, a physicist and a mathematician go to a hotel. The boiler malfunctions in the middle of the night and the radiators in each room set the curtains on fire. The engineer sees the fire, sees there is a bucket in the bathroom, fills the bucket with water and throws it over the fire. The physicist sees the fire, sees the bucket, fills the bucket to the top of his mentally calculated error margin and throws it over the fire. The mathematician sees the fire, sees the bucket, see the solution and goes back to sleep. #MathJokes #FunnyMath #MathPuns #ShortMathJoke

Home   |   Contact Us

©2020 by Data Nebulae