top of page

Results found for empty search

  • What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science?

    Never thought I will spend so much time to understand these high profile terms. I was very confident that I knew theoretically everything that was necessary for me to start writing machine learning algorithms, until couple of days back when I asked myself - Does my use case fall under machine learning topic or is it artificial intelligence? Or is it predictive analytics? I began explaining myself, but couldn’t do it right. I spent several hours reading about these topics, reading blogs, thinking and ended up writing this blog to answer myself. I hope you all will also find this post helpful. Trust me most famous terminology amongst all is - "machine learning" in past couple of years. Below chart shows Google trend (interest over time) of these high profile terms - First lets understand these terminologies individually, keep below Venn diagram in mind while you read further. This will help you to distinguish various terminologies. You know what I did just now? I asked your brain to recognize patterns. Human brain automatically recognizes such patterns (basically "deep learning") because your brain is trained with "Venn diagrams" somewhere in past. By looking at diagram, your brain is able to predict few facts like Deep learning is subset of Machine learning, Artificial Intelligence is the super set, and Data Science could spread across all technologies. Right? Trust me if you show this diagram to prehistoric man, he will not understand anything. But your brain "algorithms" are trained enough with historic data to deduce and predict such facts. Isn't it? Artificial Intelligence (AI) Artificial intelligence is the broadest term. Originated in year 1950s and the oldest terminology used amongst all which we will discuss. In one liner, Artificial intelligence (AI) is a term for simulated intelligence in machines. The concept has always been the idea of building machines which are capable of thinking like humans, mimic like humans. Simplest example of AI is chess game when you play against computer, on paper program was first proposed in 1951. Recent AI example would include self-driving cars which has always been the subject of controversy. Artificial Intelligence can be split between two branches - One is labelled “applied AI” which uses these principles of simulating human thought to carry out one specific task. The other is known as “generalized AI” – which seeks to develop machine intelligences that can turn their hands to any task, much like a person. Machine Learning (ML)Machine learning is the subset of AI which originated in 1959. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence. ML gives computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. You encounter machine learning almost everyday, think about Ride sharing apps like Lyft & Uber - How do they determine the price of your ride? Google maps - How do they analyze traffic movement and predict your arrival time within seconds? Filter spam - Emails going automatically to your spam folder? Amazon Alexa, Apple SIRI, Microsoft Cortana & Google Home - How do they recognize your speech? Deep Learning (DL) Deep learning (also known as Hierarchical learning, Deep machine learning or Deep structured learning) is a subset of Machine Learning where learning method is based on data representation or feature learning. Set of methods that allows a system to automatically discover the representations needed for feature detection or classification from raw data. Examples like Mobile check deposits - Convert handwritings on checks into actual text. Facebook face recognition - Seen Facebook recognizing names while tagging? Colorization of black and white images. Object recognition In short, all three terms (AI, ML & DL) can be related as below - recall those examples Chess board, Spam emails & Object recognition (picture credit blogs.nvidia) Predictive Analytics (PA) Under predictive analytics, the goal of the problems remains very narrow where the intent is to compute a value of a particular variable at a future point of time. You can say predictive analytics is basically a sub-field of machine learning. Machine learning is more versatile and is capable to solve a wide range of problems. There are some techniques where machine learning and predictive analytics overlap like linear and logistic regression but others like decision tree, random forest etc are essentially machine learning techniques. Keep aside these regression techniques as of now, I will write detailed blogs for these techniques. How does Data Science relate to AI, ML, PA & DL? Data science is a fairly general term for processes and methods that analyze and manipulate data. It provides you ground to apply artificial intelligence, machine learning, predictive analytics and deep learning to find meaningful and appropriate information from large volumes of raw data with greater speed and efficiency. Types of Machine learning Classification of machine learning will depend upon type of task which you expect machine to perform (Supervised, Unsupervised & Reinforcement) or based on desired output i.e. data. But at the end algorithms will remain same or you can say techniques which will help you to get the desired result. Regression: This is a type of problem where we need to predict the continuous-response value like what is the value of stock. Classification: This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” like an email it's "spam" or "not spam" Clustering: This is a type of problem where we group similar things together like grouping set of tweets from Twitter. I have tried to showcase the type with below chart, I hope you will find this helpful. Please don't limit yourself with the types of regression, classifiers & clusters which I have shown below. There are number of other algorithms which are being developed and used world wide. Ask yourself which technique fits your requirement. Thank you folks!! If you have any question please mention in comments section below. #MachineLearning #ArtificialIntelligence #DeepLearning #DataScience #PredictiveAnalytics #regression #classification #cluster Next: Spark Interview Questions and Answers Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • How to convert RDD to Dataframe?

    Main menu: Spark Scala Tutorial There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code. Scala> import org.apache.spark.sql.{Row, SparkSession} Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType} Now, create a sample RDD with parallelize method. Scala> val rdd = sc.parallelize( Seq( ("One", Array(1,1,1,1,1,1,1)), ("Two", Array(2,2,2,2,2,2,2)), ("Three", Array(3,3,3,3,3,3)) ) ) Method 1 If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method. Scala> val df1 = spark.createDataFrame(rdd) Method 2 If you need header, you can add the header explicitly by calling method toDF. Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values") Method 3 If you need schema structure then you need RDD of [Row] type. Let's create a new rowsRDD for this scenario. Scala> val rowsRDD = sc.parallelize( Seq( Row("One",1,1.0), Row("Two",2,2.0), Row("Three",3,3.0), Row("Four",4,4.0), Row("Five",5,5.0) ) ) Now create the schema with the field names which you need. Scala> val schema = new StructType(). add(StructField("Label", StringType, true)). add(StructField("IntValue", IntegerType, true)). add(StructField("FloatValue", DoubleType, true)) Now create the dataframe with rowsRDD & schema and show dataframe. Scala> val df3 = spark.createDataFrame(rowsRDD, schema) Thank you folks! If you have any question please mention in comments section below. Next: Writing data files in Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Scala IDE \.metadata\.log error fix (Mac)

    Scala IDE is not compatible with Java SE 9 and higher versions. You might need to downgrade or install Java SE 8 in order to fix the issue. Lets go through each step how you can fix it. Step 1. Check what version of Java you have installed on your machine. Run this command in your terminal: /usr/libexec/java_home --verbose As you can see I have three different versions of java running on my machine. Step 2: Install Java SE 8 (jdk1.8) if you don't find it in your list. Refer this blog for java installation steps. Step 3: Now open your .bashrc file (run command: vi ~/.bashrc) and copy-paste below line in your bashrc file. export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) Step 4: Save the file (:wq!) and reload your profile (source ~/.bashrc) Step 5: Now you need to define eclipse.ini arguments in order to use Java 1.8 version. On a Mac OS X system, you can find eclipse.ini by right-clicking (or Ctrl+click) on the Scala IDE executable in Finder, choose Show Package Contents, and then locate eclipse.ini in the Eclipse folder under Contents. The path is often /Applications/Scala IDE.app/Contents/Eclipse/eclipse.ini Step 6: Open it with text editor and copy-paste below line in eclipse.ini file. Change the version (if needed) according to your java version. Mine is 1.8.0_171. -vm /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/bin Step 7: Save the file and exit. Step 8: Run the Scala IDE application now and it should run: If you are still facing problem, please mention it in the comments section below. Thank you! #ScalaIDEinstallation #metadatalogerror #eclipse #metadata #log #error Learn Apache Spark in 7 days, start today! Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • StreamingContext: Spark streaming word count example Scala

    Main menu: Spark Scala Tutorial In this tutorial you will learn, How to stream data in real time using Spark streaming? Spark streaming is basically used for near real-time data processing. Why I said "near" real-time? Because data processing takes some time, few milliseconds. This lag can be reduced but obviously it can't be reduced to zero. This lag is so minute that we end up calling it real-time processing. Streaming word seems very cool but honestly speaking most of you have already implemented this in the form of "batch mode". Only difference is the time window. Everyone is aware of batch mode where you pull the data on hourly, daily, weekly or monthly basis and process it to fulfill your business requirements. What if you start pulling data every second and simultaneously you made your code so efficient that it can process the data in milliseconds. It would be automatically near real time processing, right? Spark streaming basically provides you ability to define sliding time windows where you can define the batch interval. After that, Spark automatically breaks the data into smaller batches for real-time processing. It basically uses RDD's (resilient distributed datasets) to perform the computation on unstructured dataset. RDD's are nothing but references to the actual data which are distributed across multiple nodes with some replication factor which reveal their values only when you perform an action (like collect) on top of it, called lazy evaluation. If you haven't installed Apache Spark, please refer this (Windows | Mac users). Screen 1 Open Scala IDE, create package com.dataneb.spark and define Scala object SparkStreaming. Check this link how you can create packages and objects using Scala IDE. Copy paste the below code on your Scala IDE and let the program run. package com.dataneb.spark /** Libraries needed for streaming the data. */ import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object SparkStreaming { /** Reducing the logging level to print just ERROR. */ def setupLogging() = { import org.apache.log4j.{Level, Logger} val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.ERROR) } def main(args: Array[String]) { /** Defining Spark configuration to utilize all the resources and * setting application name as TerminalWordCount*/ val conf = new SparkConf().setMaster("local[*]").setAppName("TerminalWordCount") /** Calling logging function */ setupLogging() /** Defining spark streaming context with above configuration and batch interval as 1*/ val ssc = new StreamingContext(conf, Seconds(1)) /** Terminal 9999 where we will entering real time messages */ val lines = ssc.socketTextStream("localhost", 9999) /** Flat map to split the words with spaces and reduce by key pair to perform count */ val words = lines.flatMap(_.split(" ")) val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print the first ten elements of each RDD wordCounts.print() ssc.start() // Start the computation ssc.awaitTermination() // Wait for the computation to terminate } } Screen 2 Now open your local terminal and type "nc -lk 9999" , hit enter. What we did just now? We just made a network connection (nc or NetCat) to local port 9999. Now type some random inputs for Spark processing as shown below, Screen 1 Now go back to Scala IDE to see the processed records, you need to swap the screens quickly to see the results as Spark will process these lines within seconds. Just kidding, you can simply pin the console and scroll back to see the results. You can see the output below. That's it. It's so simple. In real life scenario you can stream the Kafka producer to local terminal from where Spark can pick up for processing. Or you can also configure Spark to communicate with your application directly. Thank you!! If you have any question write in comments section below. #spark #scala #example #StreamingContext #streaming #word #count Next: Analyzing Twitter text with Spark Streaming Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Kibana dashboard example

    In this Kibana dashboard tutorial we will learn how to create Kibana dashboard. Before we start with sample Kibana dashboard example, I hope you have some sample data loaded into ELK Elasticsearch. If not, please refer my previous blog - How to load sample data into ELK Elasticsearch. As discussed in my previous blog I am using sample Squid access logs (comma separated CSV file). You can find the file format details at this link. Navigation Menu Introduction to ELK Stack Installation Loading data into Elasticsearch with Logstash Create Kibana Dashboard Example Kibana GeoIP Dashboard Example For our understanding, we will create two basic Kibana dashboard: Top 10 requested URL's (type: pie chart): Basically, it will show what are the top 10 URL's which are getting hits. Number of events occurred per hour (type: bar chart): It will show various types of events which occurred each hour. Use Case 1: Top 10 Requested URL's (Pie chart) Open Kibana UI on your machine and go to Visualize tab => Create a visualization: Select the type of visualization. For our first use case, select pie chart: Select the index squidal which we created earlier. Now go to Options and uncheck Donut check box as we need a simple pie chart. Check Show Labels or you can leave it blank if you don't want labels, it's up to you. Next, go to Data tab where you will find Metrics or you can say Measure in reporting terminology by default it will show as Count. Click on blue button right behind Split Slices in order to choose Aggregation type. Lets choose Terms aggregation for our use case (in simple words assume Terms like group by SQL). Refer this for more details. Further, choose the Field => Requested_URL.keyword which will act as dimension for us. Hit blue arrow button next to Options in order to generate the chart. You can also give this chart a custom label as shown below. Save the chart => Hit Save button on top right corner of your dashboard. You can name the visualization as Top 10 Requested URL. Now go to Dashboard tab => Create a dashboard => Add => Select Top 10 Requested URL Hit Save button on top of your dashboard. Give it a meaningful name, for instance Squid Access Log Dashboard. Use Case 2: Number of events per hour (Bar chart) Go to Visualize tab again (top left corner of your screen) and click on "+" sign. Choose chart type as vertical bar and select squidal index. In this case, instead of choosing aggregation type as Terms, we will be using X-Axis bucket with Date Histogram and Interval as Hourly as shown below: Hit Save button and give it an appropriate name, for instance Events per Hour. Now go back to Dashboard tab => Squid Access Log Dashboard => Edit => Add and select Events per hour to add it in your dashboard. Hit Save again. At this point your dashboard should look like this: You can add as many visualizations you want depending upon your business requirement. Your opinion matters a lot please comment if you have any questions for me. Thank you! Next: Kibana GeoIP Dashboard Example Navigation Menu: Introduction to ELK Stack Installation Loading data into Elasticsearch with Logstash Create Kibana Dashboard Example Kibana GeoIP Dashboard Example #ELKStack #KibanaDashboardExample #KibanaDashboardTutorial #CreateKibanaDashboard

  • Analyzing Twitter Data - Twitter sentiment analysis using Spark streaming

    Analyzing Twitter Data - Twitter sentiment analysis using Spark streaming. Twitter Spark Streaming - We will be analyzing Twitter data and we will be doing Twitter sentiment analysis using Spark streaming. You can do this in any programming language Python, Scala, Java or R. Main menu: Spark Scala Tutorial Spark streaming is very useful in analyzing real time data from IoT technologies which could be your smart watch, Google Home, Amazon Alexa, Fitbit, GPS, home security system, smart cameras or any other device which communicates with internet. Social accounts like Facebook, Twitter, Instagram etc generate enormous amount of data every minute. Below trend shows interest over time for three of these smart technologies over past 5 years. In this example we are going to stream Twitter API tweets in real time with OAuth authentication and filter the hashtags which are most famous among them. Prerequisite Download and install Apache Spark and Scala IDE (Windows | Mac) Create Twitter sample application and obtain your client secret, client secret key, access token and access token secret. Refer this to know how to get Twitter development account and api access keys. Authentication file setup Create a text file twitter.txt with Twitter OAuth details and place it anywhere on your local directory system (remember the path). File content should look like this. Basically it has two fields separated with single space - first field contains OAuth headers name and second column contains api keys. Make sure there is no extra space anywhere in your file else you will get authentication errors. There is a new line character at the end (i.e. hit enter after 4th line). You can see empty 5th line in below screenshot. Write the code! Now create a Scala project in Eclipse IDE (see how to create Scala project), refer the following code that prints out live tweets as they stream using Spark Streaming. I have written separate blog to explain what are basic terminologies used in Spark like RDD, SparkContext, SQLContext, various transformations and actions etc. You can go through this for basic understanding. However, I have explained little bit in comments above each line of code what it actually does. For list of spark functions you can refer this. // Our package package com.dataneb.spark // Twitter libraries used to run spark streaming import twitter4j._ import twitter4j.auth.Authorization import twitter4j.auth.OAuthAuthorization import twitter4j.conf.ConfigurationBuilder import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.streaming._ import org.apache.spark.streaming.twitter._ import org.apache.spark.streaming.StreamingContext._ import scala.io.Source import org.apache.spark.internal.Logging import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.dstream._ import org.apache.spark.streaming.receiver.Receiver /** Listens to a stream of Tweets and keeps track of the most popular * hashtags over a 5 minute window. */ object PopularHashtags { /** Makes sure only ERROR messages get logged to avoid log spam. */ def setupLogging() = { import org.apache.log4j.{Level, Logger} val rootLogger = Logger.getRootLogger() rootLogger.setLevel(Level.ERROR) } /** Configures Twitter service credentials using twiter.txt in the main workspace directory. Use the path where you saved the authentication file */ def setupTwitter() = { import scala.io.Source for (line <- Source.fromFile("/Volumes/twitter.txt").getLines) { val fields = line.split(" ") if (fields.length == 2) { System.setProperty("twitter4j.oauth." + fields(0), fields(1)) } } } // Main function where the action happens def main(args: Array[String]) { // Configure Twitter credentials using twitter.txt setupTwitter() // Set up a Spark streaming context named "PopularHashtags" that runs locally using // all CPU cores and one-second batches of data val ssc = new StreamingContext("local[*]", "PopularHashtags", Seconds(1)) // Get rid of log spam (should be called after the context is set up) setupLogging() // Create a DStream from Twitter using our streaming context val tweets = TwitterUtils.createStream(ssc, None) // Now extract the text of each status update into DStreams using map() val statuses = tweets.map(status => status.getText()) // Blow out each word into a new DStream val tweetwords = statuses.flatMap(tweetText => tweetText.split(" ")) // Now eliminate anything that's not a hashtag val hashtags = tweetwords.filter(word => word.startsWith("#")) // Map each hashtag to a key/value pair of (hashtag, 1) so we can count them by adding up the values val hashtagKeyValues = hashtags.map(hashtag => (hashtag, 1)) // Now count them up over a 5 minute window sliding every one second val hashtagCounts = hashtagKeyValues.reduceByKeyAndWindow( (x,y) => x + y, (x,y) => x - y, Seconds(300), Seconds(1)) // Sort the results by the count values val sortedResults = hashtagCounts.transform(rdd => rdd.sortBy(x => x._2, false)) // Print the top 10 sortedResults.print // Set a checkpoint directory, and kick it all off // I can watch this all day! ssc.checkpoint("/Volumes/Macintosh HD/Users/Rajput/Documents/checkpoint/") ssc.start() ssc.awaitTermination() } } Run it! See the result! I actually went to my Twitter account to see top result tweet #MTVHottest and it was trending, see the snapshot below. You might face error if You are not using proper version of Scala and Spark Twitter libraries. I have listed them below for your reference. You can download these libraries from these two links Link1 & Link2. Scala version 2.11.11 Spark version 2.3.2 twitter4j-core-4.0.6.jar twitter4j-stream-4.0.6.jar spark-streaming-twitter_2.11-2.1.1.jar Thank you folks. I hope you are enjoying these blogs, if you have any doubt please mention in comments section below. #AnalyzingTwitterData #SparkStreamingExample #Scala #TwitterSparkStreaming #SparkStreaming Next: Sample Big Data Architecture with Apache Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Samba Installation on OEL (Oracle Enterprise Linux), configuration and file sharing

    Samba is a software package which gives users flexibility to seamlessly share Linux/Unix files with Windows client and vice-versa. Samba installation, configuration and file sharing is very simple on oracle linux. Why is the need of Samba? Let's say you’re managing two operating systems - Linux and Windows on the same computer, dual booting every day, switching between the platforms depending upon your requirement. Perhaps you’re planning to eventually move to Linux as your main operating system; or probably Windows; you might even have plans to remove Microsoft’s OS from your computer at some point soon. One of the things holding you back is the ability to access data between operating systems. Or assume another scenario where you are running various component of Big Data architecture (Data Collection & Processing) on Linux machine but Visualization tool demands Windows file system only. Let’s see how we can work around this problem, and get your data where you want it. I am not saying Samba is the "only" solution for this problem but yes it is one of the best solution - no cost, real-time file sharing & super-fast; what else you need? What is Samba? Samba is a software package which gives users flexibility to seamlessly share Linux/Unix files with Windows client and vice-versa. With an appropriately-configured Samba server on Linux, Windows clients can map drives to the Linux file systems. Similarly, Samba client on UNIX can connect to Windows shares. Samba basically implements network file sharing protocol using Server Message Block (SMB) & Common Internet File System (CIFS). Installation and configuring Samba (v3.6) on Linux machine (RHEL/CentOS): 1. Execute below command on your terminal to install samba packages: yum install samba samba-client samba-common -y 2. Edit the configuration file /etc/samba/smb.conf mv /etc/samba/smb.conf /etc/samba/smb.conf.backup vi /etc/samba/smb.conf and copy-paste below details in configuration file. [global] workgroup = EXAMPLE.COM server string = Samba Server %v netbios name = centos security = user map to guest = bad user dns proxy = no #======== Share Definitions =========================== [Anonymous] path = /samba/anonymous browsable =yes writable = yes guest ok = yes read only = no 3. Now save the configuration file and create a Samba shared folder. Then, restart Samba services. Run below commands step by step: mkdir -p /samba/anonymous cd /samba chmod -R 0755 anonymous/ chown -R nobody:nobody anonymous/ chcon -t samba_share_t anonymous/ systemctl enable smb.service systemctl enable nmb.service systemctl restart smb.service systemctl restart nmb.service 4. You can check samba services by running ps -eaf | grep smbd; ps -eaf | grep nmbd 5. Now go to the windows Run prompt and type \\yourhostname\anonymous. Thats it!! You will be able to access anonymous shared drive by now. 6. If this doesn't connect to the shared folder. Please make sure your firewall services are stopped and try again. You can run below commands to stop the services. service firewalld stop service iptables stop 7. Now test shared folder by creating a sample text file on linux machine and opening it on windows machine. Samba Documentation You can find complete Samba documentation at below mentioned link. It's very well documented and can be easily understood. https://www.samba.org/samba/docs/ https://www.samba.org/samba/docs/man/ https://wiki.samba.org/index.php/Presentations https://wiki.samba.org/index.php/Main_Page https://wiki.samba.org/index.php/Installing_Samba If you enjoyed this post, please comment if you have any question regarding Samba Installation (on any operating system) and I would try to response as soon as possible. Thank you! Next : How to setup Oracle Linux on Virtual Box? #Samba #Installation #OEL #OracleLinux #Configuration #filesharing #Oracle #Linux Interested in learning Apache Spark? Main menu: Spark Scala Tutorial ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 1.3 Pyspark installation on Windows 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Tangy Creamy Tomato Carrot Soup Recipe

    When it's cold outside, nothing hits the spot quite like a creamy tomato carrot soup. This healthy, quick and easy recipe makes use of real cream, for a thick and satisfying soup with fabulous flavor that's ready in less than an hour. What could be better for dinner that a nice hot bowl of soup along with your favorite salad. Well this is what I was craving for one evening and turned out I had no vegetables in my refrigerator except for 3 carrots and few tomatoes. So a mashup of the left over ingredients in my fridge turned into a nice soothing bowl of carrot tomato soup topped with croutons and dollops of cottage cheese (current favorite at my place). Below are the details of the recipe. Do try it out and I would love to hear how it turned out. Let me know if anyone comes up with interesting variations as well. Preparation time: 30-40 min, Serves: 3-4 Ingredients 3 Carrots 2 Tomatoes 1/2 inch Ginger 3-4 Garlic Cloves 1 Onion 2-3 Green Chilies (as per spice preference) 1-2 tsp of cottage cheese ( as per preference) Salt to taste Pepper Oregano/Cilantro/Parsley ( as per preference) Heavy cream ( optional) Preparation Steps Heat Oil in a pan. Add chopped onions, garlic, ginger, green chilies. Saute till the onions are translucent. Next, add chopped carrots and saute for another 3 minutes. Next add the chopped tomatoes and continue to saute till the tomatoes are mushy and carrots are soft. I prefer to saute till the veggies are nicely roasted and the tomatoes have dissolved. Now blend this to a smooth paste. Pour the paste into the same pan and add 2 cups of hot water while stirring continuously. Add Salt per taste. Cook till it boils rapidly. Adjust the consistency as per your preference. Finally add salt to taste , fresh cracked pepper and some oregano or any herb of your choice. Add 2 teaspoons of cottage cheese to each serving and top it with some croutons. Hope you all like this recipe and can enjoy it on a cozy winter evening!

  • What is "is" in Python?

    The operators is and is not test for object identity. x is y is true if and only if x and y are the same object (object location in memory is same). Object identity is determined using the id() function. x is not y yields the inverse truth value. It’s easy to understand if I compare this operator with equality operator. is and is not compares the object reference. Check for identity. == and != compares the object value. Check for equality. For example, if you consider integer objects (excluding integers from -5 to 256), >>> A=9999 >>> B=9999 >>> A == B, A is B (True, False) >>> A, B (9999, 9999) >>> id(A), id(B) (4452685328, 4452686992) Python stores integer objects as single object between range -5 to 256 so the identity is same. >>> A=99 >>> B=99 >>> A == B, A is B (True, True) >>> A, B (99, 99) >>> id(A), id(B) (4404392064, 4404392064) Lets see behavior of other immutable objects like int & float, string, tuples and boolean: >>> 1 == 1.0, 1 is 1.0 (True, False) >>> 1 == 1, 1 is 1 (True, True) >>> 'A' == 'A', 'A' is 'A' (True, True) >>> (1,2) == (1,2), (1,2) is (1,2) (True, True) >>> True == True, True is True (True, True) What about mutable objects - list, set, dict? Behavior is totally different. >>> [1,2] == [1,2], [1,2] is [1,2] (True, False) >>> {1,2} == {1,2}, {1,2} is {1,2} (True, False) >>> {'k1':1} == {'k1':1}, {'k1':1} is {'k1':1} (True, False) is operator is used only when you want to compare the object identity, for regular comparison equality operator is used.

  • Installing Apache Spark and Scala (Mac)

    Main menu: Spark Scala Tutorial In this Spark Scala tutorial you will learn, How to install Apache Spark on Mac OS. By the end of this tutorial you will be able to run Apache Spark with Scala on Mac machine. You will also download Eclispe for Scala IDE. To install Apache Spark on windows machine visit this. Installing Homebrew You will be installing Apache Spark using Homebrew. So install Homebrew if you don’t have it, visit: https://brew.sh/ and copy paste the command on your terminal and run it. Installing Apache Spark Open terminal and type command brew install apache-spark and hit Enter. Create a log4j.properties file. Type cd /usr/local/Cellar/apache-spark/2.3.1/libexec/conf and hit Enter. Please change the version according to your downloaded version. Spark 2.3.1 is the version installed for me. Type cp log4j.properties.template log4j.properties and hit Enter. Edit the log4j.properties file and change the log level from INFO to ERROR on log4j.rootCategory. We are just changing the log level from INFO to ERROR only. Download Scala IDE Install the Scala IDE from here. Open the IDE once, just to check if it's running fine. You should see panels like this; Test it out! Open terminal and go to the directory where apache-spark was installed to (such as cd /usr/local/Cellar/apache-spark/2.3.1/libexec/) and then type ls to get a directory listing. Look for a text file, like README.md or CHANGES.txt. Type command spark-shell and hit Enter. At this point you should see a scala> prompt. If not, double check the steps above. Type val rdd = sc.textFile("README.md") or whatever text file you’ve found and hit Enter. You have just created a rdd of readme text file. Now type rdd.count() and hit Enter to count the number of lines in text file. You should get a count of number of lines in that file! Congratulations, you just ran your first Spark program! Don't worry about the commands, I will explain them. Sample Execution You’ve got everything set up! If you have any question please don't forget to mention in the comments section below. Main Menu | Next: Just enough Scala for Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

bottom of page