82 results found for ""

  • How to convert RDD to Dataframe?

    Main menu: Spark Scala Tutorial There are basically three methods by which we can convert a RDD into Dataframe. I am using spark shell to demonstrate these examples. Open spark-shell and import the libraries which are needed to run our code. Scala> import org.apache.spark.sql.{Row, SparkSession} Scala> import org.apache.spark.sql.types.{IntegerType, DoubleType, StringType, StructField, StructType} Now, create a sample RDD with parallelize method. Scala> val rdd = sc.parallelize( Seq( ("One", Array(1,1,1,1,1,1,1)), ("Two", Array(2,2,2,2,2,2,2)), ("Three", Array(3,3,3,3,3,3)) ) ) Method 1 If you don't need header, you can directly create it with RDD as input parameter to createDataFrame method. Scala> val df1 = spark.createDataFrame(rdd) Method 2 If you need header, you can add the header explicitly by calling method toDF. Scala> val df2 = spark.createDataFrame(rdd).toDF("Label", "Values") Method 3 If you need schema structure then you need RDD of [Row] type. Let's create a new rowsRDD for this scenario. Scala> val rowsRDD = sc.parallelize( Seq( Row("One",1,1.0), Row("Two",2,2.0), Row("Three",3,3.0), Row("Four",4,4.0), Row("Five",5,5.0) ) ) Now create the schema with the field names which you need. Scala> val schema = new StructType(). add(StructField("Label", StringType, true)). add(StructField("IntValue", IntegerType, true)). add(StructField("FloatValue", DoubleType, true)) Now create the dataframe with rowsRDD & schema and show dataframe. Scala> val df3 = spark.createDataFrame(rowsRDD, schema) Thank you folks! If you have any question please mention in comments section below. Next: Writing data files in Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Data Driven Parenting: An Introduction (Entry 1)

    As a first-time parent, I find myself wondering how each decision I'm making might end up "messing up" my kid. How each factor that I introduce could ripple down and somehow eventually lead to my child sitting on a meticulously upholstered psychiatrist's couch talking about how all their problems stemmed from childhood and were particularly the fault of some defect or distortion in their relationship with their mother (aka me). But I think back to my own childhood: left for long unsupervised lengths times in the car parked outside of a grocery store, freely flipping through mystery/horror/slasher movies with my friends, and eating Hot Pockets and Pop-Tarts for dinner. Am I messed up? I mean, probably a little but aren't we all? There are libraries filled with parenting advice, oftentimes offering contrary opinions. Homo Sapiens have perpetuated for an estimated 200,000 to 300,000 years, how badly could we be doing? These are the concerns that I imagine drove Brown University economist Emily Oster to write Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Penguin Press). I paged through it quickly at my local bookstore and it got me thinking: how much does parenting actually contribute to child outcome? What should we really be doing? Can we actually mess up our kids? Utilizing Oster's compiled research from Cribsheet as a foundation, I'll be exploring what the past and present research states, as well as what findings in animals has also suggested. What does the data on parenting say? Are we just becoming more anxious, more allergic, more obese, hopeless? doomed?? Does anyone really know what they're doing? Hopefully, we'll find out. Join me later for Data Driven Parenting: ??? Entry 2.

  • Scala IDE \.metadata\.log error fix (Mac)

    Scala IDE is not compatible with Java SE 9 and higher versions. You might need to downgrade or install Java SE 8 in order to fix the issue. Lets go through each step how you can fix it. Step 1. Check what version of Java you have installed on your machine. Run this command in your terminal: /usr/libexec/java_home --verbose As you can see I have three different versions of java running on my machine. Step 2: Install Java SE 8 (jdk1.8) if you don't find it in your list. Refer this blog for java installation steps. Step 3: Now open your .bashrc file (run command: vi ~/.bashrc) and copy-paste below line in your bashrc file. export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) Step 4: Save the file (:wq!) and reload your profile (source ~/.bashrc) Step 5: Now you need to define eclipse.ini arguments in order to use Java 1.8 version. On a Mac OS X system, you can find eclipse.ini by right-clicking (or Ctrl+click) on the Scala IDE executable in Finder, choose Show Package Contents, and then locate eclipse.ini in the Eclipse folder under Contents. The path is often /Applications/Scala IDE.app/Contents/Eclipse/eclipse.ini Step 6: Open it with text editor and copy-paste below line in eclipse.ini file. Change the version (if needed) according to your java version. Mine is 1.8.0_171. -vm /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/bin Step 7: Save the file and exit. Step 8: Run the Scala IDE application now and it should run: If you are still facing problem, please mention it in the comments section below. Thank you! #ScalaIDEinstallation #metadatalogerror #eclipse #metadata #log #error Learn Apache Spark in 7 days, start today! Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • What is Big Data Architecture? Ingest, Transform/Enrich and Publish

    What is Big Data Architecture? How to define Big Data Architecture? What is the need of Big Data Architecture? What are various ways to ingest, transform, enrich & publish Big Data? There are several questions around it. Let's explore few sample big data architectures. Main menu: Spark Scala Tutorial With glittering new tools in the market and myriad buzzwords surrounding data operations, consumers of information often overlook the building process, believing insights gleaned from data are instantaneous and automated. We live in the “pre-AI” age where clear answers to qualitative questions from quantitative analysis requires human intervention. Yes, advanced data science gives us extensive means to visualize and cross-section, but human beings are still needed to ask questions in a logical fashion and find the significance of resulting insights. Please note above architecture is just a sample architecture which varies depending upon nature of data and client requirement. We will discuss it in detail shortly. What is the need of Big Data Architecture? A big data architecture is designed to handle the ingestion, enrichment & processing of raw structured, semi-structured and unstructured data that is too large or complex for traditional database systems or traditional data warehousing system. The three V's - volume, velocity & variety - are the most common properties of Big Data architecture. Whether we end up electing well known Kappa or Lambda architecture of big data architecture - understanding of three V's and nature of data plays a very crucial role in our big data architecture. For instance, If the velocity of data is very low or volume is very low why don't we go with traditional database systems? Instead I have seen organization rushing towards transforming their traditional data warehouse systems into big data architectures because it's shinning in market. Let's categorize Big Data Architecture Workloads Real-time processing, data sources like IoT devices - I would rather say it's "near" real time processing (Ingestion & Enrichment will take few seconds). If the solution includes real-time sources, the architecture must include a way to capture and store real-time messages for stream processing. Usually these streams are carried out using Apache Kafka & Zookeeper pair, Amazon Simple Queue Service (SQS), JBoss, RabbitMQ, IBM Websphere MQ, Microsoft Messaging Queue etc. Kappa architecture is famous for this type of workload. Batch processing - Because the data sets are so large, often a big data architecture solution must process data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually these jobs involve reading source files, processing them, and writing the output to new files. Options include running U-SQL (Unstructured SQL) jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster. Lambda architecture is generally used for this. Machine learning & Predictive analytics - A common misconception is that predictive analytics and machine learning are the same thing. This is not the case. At its core, predictive analytics encompasses a variety of statistical techniques (including machine learning, predictive modeling and data mining) and uses statistics (both historical and current) to estimate, or ‘predict’, future outcomes. Machine learning, on the other hand, is a sub-field of computer science that gives ‘computers the ability to learn without being explicitly programmed’. Machine learning evolved from the study of pattern recognition and explores the notion that algorithms can learn from and make predictions on data. And, as they begin to become more ‘intelligent’, these algorithms can overcome program instructions to make highly accurate, data-driven decisions. R, Python & Scala are popular languages to work with these workloads. Big Data Architecture backbone: Data refinement is the key! Serious data scientists need to make data refinement their first priority, and break down the data work into three steps: Data Ingestion, or call it Data Collection layer - People use different terminologies for the first layer. However main focus of this layer is to choose right technology depending upon Big Data architecture workload and project requirement. If requirement demands real-time processing we can use Kafka or any other real time MQ systems mentioned earlier. If source is just a flat file which is generated few times a day, go with simple file transfer protocol. At the end, don't forget money and third party vendor technology support matters as well. Data Enrichment, Transformation, Processing & Refinement - To be instrumentally useful, data must be converted into “answers” to questions. In other words, Big Data must get smaller after passing second layer. Don't pile up raw data which is not in question as it's going to dramatically slow down your process over a period. Data Publish, or Delivery so called the Presentation Layer - Deliver the answers through optimized channels in proper formats and frequency. This layer includes reporting, visualization, data exploration, ad-hoc querying and export datasets. Visualization through Tableau, QlikView etc, reporting through BOBJ, SSRS etc , ad-hoc querying using Hive, Impala, Spark SQL etc. Further, choice of technology depend upon end users - different users like administrator, business users, vendor, partners etc. demand data in different format. Data storage : Last but not least! Hadoop distributed file system is the most commonly used storage framework in Big Data architecture, others are the NoSQL data stores – MongoDB, HBase, Cassandra etc. One of the salient features of Hadoop storage is its capability to scale, self-manage and self-heal. Things to consider while planning storage methodology: Type of data (historical or incremental) Format of data ( structured, semi-structured and unstructured) Analytical requirement that storage can support (synchronous & asynchronous) Compression requirements Frequency of incoming data Query pattern on the data Consumers of the data Thank you! If you have any question please don't forget to mention in comments section below. Next: What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Interesting and Unknown facts about Python programming

    Python has gained more popularity in past 1 year! Actually, it’s the most popular programming language now. JAVA is the second most popular language after Python according to Google Trends. Python is the most popular programming language in United States of America! Do you know Monty Python? Why Python is called Python? Absolutely nothing to do with the snakes! It was named after this comedy group called Monty Python. When Guido van Rossum began implementing Python, he was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python. The Python Logo - “Two Snakes”! Projects and companies that use Python are encouraged to incorporate the Python logo on their websites, brochures, packaging, and elsewhere to indicate suitability for use with Python or implementation in Python. Use of the "two snakes" logo element alone, without the accompanying wordmark is permitted on the same terms as the combined logo. According to Python Community, "Python is powerful... and fast; plays well with others; runs everywhere; is friendly & easy to learn; is Open." Import This! There is actually a poem written by Tim Peters named as THE ZEN OF PYTHON which can be read by just writing import this. import this #OutputThe Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently.Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Wiki Python is best source of learning Python PythonBooks - Python Wiki. Tuple Unpacking with Enumerate. To get the index of elements. It is very useful for me when I look for indexes. for a,b in enumerate([1,2,3,4,5]): print a,b #Output 0 1 1 2 2 3 3 4 4 5 Chain comparison For example, if you want to check if a number is between 1000 to 10000, you can code like this if 1000<=num<=10000: print True You can work with infinity in Python import math math.isinf # Python 2 math.inf # Python 3 You can use else with for loop as well as while loop! What about this unpacking? x,y = 1,2 x,y = y,x+y print x,y #Output 2 3 It’s over 38 years now Van Rossum started developing the new script in the late 1980s and finally introduced the first version of that programming language in 1991. This initial release has module system of Modula-3. Later on, this programming language was named 'Python'. Van Rossum answer to why was Python created in first place? I had extensive experience with implementing an interpreted language in the ABC group at CWI, and from working with this group I had learned a lot about language design. This is the origin of many Python features, including the use of indentation for statement grouping and the inclusion of very-high-level data types (although the details are all different in Python). I had a number of gripes about the ABC language, but also liked many of its features. It was impossible to extend the ABC language (or its implementation) to remedy my complaints – in fact its lack of extensibility was one of its biggest problems. I had some experience with using Modula-2+ and talked with the designers of Modula-3 and read the Modula-3 report. Modula-3 is the origin of the syntax and semantics used for exceptions, and some other Python features. I was working in the Amoeba distributed operating system group at CWI. We needed a better way to do system administration than by writing either C programs or Bourne shell scripts, since Amoeba had its own system call interface which wasn’t easily accessible from the Bourne shell. My experience with error handling in Amoeba made me acutely aware of the importance of exceptions as a programming language feature. It occurred to me that a scripting language with a syntax like ABC but with access to the Amoeba system calls would fill the need. I realized that it would be foolish to write an Amoeba-specific language, so I decided that I needed a language that was generally extensible. During the 1989 Christmas holidays, I had a lot of time on my hand, so I decided to give it a try. During the next year, while still mostly working on it in my own time, Python was used in the Amoeba project with increasing success, and the feedback from colleagues made me add many early improvements. In February 1991, after just over a year of development, I decided to post to USENET. The rest is in the Misc/HISTORY file. Roundoff Error 0.1 + 0.2 - 0.3 is not equal to zero in Python (roundoff error) 0.1 + 0.2 - 0.3 #output 5.551115123125783e-17 Packages There are over 160K packages in PyPI repository I think most of us are aware of this but still.. I love this one. def superhero(x,y): return x*y , x+y, x-y, x/y , x**y mul, add, sub, div, power = superhero(7,2) print (mul, add, sub, div, power) #Output (14, 9, 5, 3, 49) Google "Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python, and we're looking for more people with skills in this language." said Peter Norvig, director of search quality at Google, Inc. YouTube "Python is fast enough for our site and allows us to produce maintainable features in record times, with a minimum of developers," said Cuong Do, Software Architect!!

  • Samba Installation on OEL (Oracle Enterprise Linux), configuration and file sharing

    Samba is a software package which gives users flexibility to seamlessly share Linux/Unix files with Windows client and vice-versa. Samba installation, configuration and file sharing is very simple on oracle linux. Why is the need of Samba? Let's say you’re managing two operating systems - Linux and Windows on the same computer, dual booting every day, switching between the platforms depending upon your requirement. Perhaps you’re planning to eventually move to Linux as your main operating system; or probably Windows; you might even have plans to remove Microsoft’s OS from your computer at some point soon. One of the things holding you back is the ability to access data between operating systems. Or assume another scenario where you are running various component of Big Data architecture (Data Collection & Processing) on Linux machine but Visualization tool demands Windows file system only. Let’s see how we can work around this problem, and get your data where you want it. I am not saying Samba is the "only" solution for this problem but yes it is one of the best solution - no cost, real-time file sharing & super-fast; what else you need? What is Samba? Samba is a software package which gives users flexibility to seamlessly share Linux/Unix files with Windows client and vice-versa. With an appropriately-configured Samba server on Linux, Windows clients can map drives to the Linux file systems. Similarly, Samba client on UNIX can connect to Windows shares. Samba basically implements network file sharing protocol using Server Message Block (SMB) & Common Internet File System (CIFS). Installation and configuring Samba (v3.6) on Linux machine (RHEL/CentOS): 1. Execute below command on your terminal to install samba packages: yum install samba samba-client samba-common -y 2. Edit the configuration file /etc/samba/smb.conf mv /etc/samba/smb.conf /etc/samba/smb.conf.backup vi /etc/samba/smb.conf and copy-paste below details in configuration file. [global] workgroup = EXAMPLE.COM server string = Samba Server %v netbios name = centos security = user map to guest = bad user dns proxy = no #======== Share Definitions =========================== [Anonymous] path = /samba/anonymous browsable =yes writable = yes guest ok = yes read only = no 3. Now save the configuration file and create a Samba shared folder. Then, restart Samba services. Run below commands step by step: mkdir -p /samba/anonymous cd /samba chmod -R 0755 anonymous/ chown -R nobody:nobody anonymous/ chcon -t samba_share_t anonymous/ systemctl enable smb.service systemctl enable nmb.service systemctl restart smb.service systemctl restart nmb.service 4. You can check samba services by running ps -eaf | grep smbd; ps -eaf | grep nmbd 5. Now go to the windows Run prompt and type \\yourhostname\anonymous. Thats it!! You will be able to access anonymous shared drive by now. 6. If this doesn't connect to the shared folder. Please make sure your firewall services are stopped and try again. You can run below commands to stop the services. service firewalld stop service iptables stop 7. Now test shared folder by creating a sample text file on linux machine and opening it on windows machine. Samba Documentation You can find complete Samba documentation at below mentioned link. It's very well documented and can be easily understood. https://www.samba.org/samba/docs/ https://www.samba.org/samba/docs/man/ https://wiki.samba.org/index.php/Presentations https://wiki.samba.org/index.php/Main_Page https://wiki.samba.org/index.php/Installing_Samba If you enjoyed this post, please comment if you have any question regarding Samba Installation (on any operating system) and I would try to response as soon as possible. Thank you! Next : How to setup Oracle Linux on Virtual Box? #Samba #Installation #OEL #OracleLinux #Configuration #filesharing #Oracle #Linux Interested in learning Apache Spark? Main menu: Spark Scala Tutorial ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 1.3 Pyspark installation on Windows 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • What does // mean in python?

    It’s floor division. Python docs has very nice documentation on this. Python3: Mathematical division that rounds down to nearest integer. The floor division operator is //. For example, the expression 11 // 4 evaluates to 2 in contrast to the 2.75 returned by float true division. >>> 11//4 2 >>> 11/4 2.75 Note that (-11) // 4 is -3 because that is -2.75 rounded downward. >>> -11//4 -3 >>> -11/4 -2.75 Python2: True division returns round integer in older version of python. Floor division is same. >>> 11//4 2 >>> 11/4 2 >>> -11//4 -3 >>> -11/4 -3

  • How can I get the source code of the Python Packages?

    You can find source code on Github. Under cpython you will be able to find all the modules and python objects (written in C). You can also find which file (using __file__ attribute) on your system is used for respective module. For example math and random module, >>> import math >>> import random >>> math.__doc__ 'This module is always available. It provides access to the\nmathematical functions defined by the C standard.' >>> math.__file__ '/Users/Rajput/anaconda/lib/python2.7/lib-dynload/math.so' >>> random.__file__ '/Users/Rajput/anaconda/lib/python2.7/random.pyc' Now, you can get source code for objects written in python (not C) with inspect get source code. You can also refer Python docs for “inspect” library. >>> import inspect >>> inspect.getsourcelines(random) (['"""Random variable generators.\n', '\n', ' integers\n', ' --------\n', ' uniform within range\n', '\n', ' sequences\n', ' ---------\n', ' pick random element\n', ' pick random sample\n', : so on..

  • Road trip from New York to Los Angeles

    This road trip has something for everyone whether you are traveling with family or in a group. Driving from east coast to west coast or west coast to east coast takes approx. 42 hours if you drive non-stop covering distance of 2,790 miles. However, this road trip isn't something you haste, instead it's for fun and thrill. Over 7 days you will drive through 12 different states: New York, Pennsylvania, Ohio, Indiana, Illinois, Iowa, Nebraska, Colorado, New Mexico, Nevada, Arizona (optional) and California. Arizona is optional if you wish to pass through. I love Arizona for its vivid landscape - Antelope Canyon, Grand Canyon, Monument Valley, Horseshoe Bend, hence I extended my trip to drive through Arizona. Below is the picture of Horseshoe Bend in Arizona, I love this nature's creation. You will be driving through 4 different time-zones (EST,CST,MST,PST), probably the longest road trip which you can imagine in United States and trust me you will love it. Road trip is always a fun for people who love driving and if you are driving in a group it will be more memorable. I have mentioned the hotels, locations, routes and drive hours which I followed. You can tweak it if you have any other plans. Unfortunately, I was the only driver for the whole trip. However, I was not alone, my wife helped me to keep me awake with her poor jokes ;) Initial Plan You can plus/minus extra days in between if you like/dislike the place. Don't book hotels in advance, instead book it on the same day so that you are independent to change your plan. Day 1: New York to Chicago, IL Extra day to roam around Chicago Day 2: Chicago, IL to Omaha, NE Day 3: Omaha, NE to Rocky Mountains, CO Extra day to roam around Colorado Day 4: Rocky Mountains, CO to Page, AZ Extra day to roam around Arizona Day 5: Page, AZ to Las Vegas, NE Extra day to roam around Vegas Day 6: Las Vegas, NE to Los Angeles, CA Roam around Los Angeles, San Diego ;) Day 1. New York to Chicago, IL (11 hours, 745 miles via I-80) Eleven hours drive is too much, isn't it? Yes, but I planned to drive maximum on first day because first day I was full of energy, also stopping at Chicago was a good idea. However, you can split this drive into two days (7+5 hours drive) making a stop at Cleveland, OH if eleven hours drive is too much for you. You can visit Rock and Roll Hall of Fame if you get some time in the evening, it's located on the shore of lake Erie in downtown Cleveland and it's beautiful. I have already visited Cleveland couple of times so I didn't had much to explore there. Anyways, I stayed at Cool Springs Inn, Michigan City approximately an hour drive from Chicago due to couple of reasons. First reason was cost, hotels in Chicago were too expensive, approx. 3-4 times costlier than what I paid in Michigan City ($45). Secondly, Chicago is hardly an hour drive from this place. So you can wake up little early in morning and roam around Chicago if you want. You can visit Navy Pier, Willis Tower, Cloud Gate, John Hancock Center, Shedd Aquarium, Art Institute of Chicago etc. There is a lot to do in Chicago (in fact a day is not enough for Chicago) so get a city pass and try to explore as much you can. At the end of the day you can drive back to Cool Springs or book another hotel nearby. Palmer House is a good option if you want to spend more time in Chicago downtown, I had been there couple of times. It's little costlier (~$150/night), and you have to pay extra for parking (~$50) but the place was awesome. Day 2: Chicago, IL to Omaha, NE (7.5 hours, 466 miles via I-80) Well I didn't stop at Omaha (Horizon Inn Motel) according to my initial plan, instead I drove another 8 hours to stop at Rocky Mountains, CO which was actually my day 3 stop. It sounds crazy but yeah I drove over 15 hours to reach Colorado. I was more interested in hiking and roaming Colorado, basically with extra 8 hours driving I saved a day for Colorado. However, there is lot to do in Omaha as well if you stick to initial plan. If you have interest in Zoo and America's largest indoor rain-forest, visit to the Henry Doorly Zoo and Aquarium. It has incredible indoor desert and rain-forest. Day 3: Omaha to Rocky Mountain, CO (8 hours, 568 miles via I-76) Colorado is too big to be explored in a single day. So I stayed for couple of days there, first stop at Coyote Mountain Lodge. Started the day with Rocky Mountain hiking and drove through Garden of Gods. Second day, I stopped at Estes Park and visited Royal Gorge bridge (shown below) & later in the evening, we went to Great Sand Dunes National Park. I haven't seen such landscape like Great Sand Dunes in my entire life. You will find snowy mountain, desert, lake and green forest at same place. It's mesmerizing! Day 4: Sand Dunes, CO to Page, AZ (7 hours, 426 miles via US-160) Arizona is also too big to explore in one day. But you can visit Antelope Canyon, Horseshoe Bend and drive through Zion National Park or Grand Canyon. Grand Canyon itself needs couple of days if you want to visit it's all corner. You can do helicopter tour if you wanna save some time. Below is my random picture from Zion National Park, lost while driving. Zion national park is really beautiful, don't miss drive through if you are nearby. Antelope canyon at noon, shines bright orange in sunlight (so try to visit around noon). Day 5: Page, AZ to Las Vegas, NE (4 hours, 272 miles via US-89 & I-15) Las Vegas is self explanatory and I don't think this place needs any explanation. You can try different food, drinks, enjoy street walk, night life, enjoy rides and roam around. Saturday night is the most popular there, so plan your trip accordingly. Day 6: Las Vegas, NE to Los Angeles, CA (4 hours, 270 miles via I-15) You can cover this distance on 5th day itself if you don't like Vegas. But I believe that's not the case, no-one ants to skip Vegas. Southern California has tons of things to do - Universal Studio, San Diego Zoo, Griffith Observatory, Santa Monica Pier, tons of beaches etc etc. However, one of my favorites is Potato Chip Rock Mountain, it's little mountain hike near San Diego county if you get some extra time. Here is the pic, Things to remember before you start Have full servicing of your vehicle before you plan this trip like engine oil change, tire condition, lights, breaks etc. I wasted many hours due to oil change in Arizona so be careful. Don't book hotel in advance. But try to book it before 3:00 pm if you are booking on same day otherwise it will be very costly. I had to cancel one booking due to change in plan. Keep extra blankets and pillow in your car in case you need rest. Don't overload your vehicle, always keep extra space for yourself. Take continuous break while driving. I was taking break every 3-4 hours of driving. Don't drive more than 8 hours per day. Well sorry guys I didn't follow this rule myself. Keep warm clothes (Jackets) in your car. Weather will be changing a lot with such long distance tour. When I started the trip it was 25 degree Celsius in New York and when I reached Colorado it dropped to 2 degree Celsius. Keep plenty of water and food supply in your car. Keep a tire inflator in your car (at Amazon, you will find it for $15). Avoid driving between 6 pm to 7 pm due to sunset. As you are driving towards west, you will face Sun every evening. Most important - Enjoy your trip, don't haste! Refer TripAdvisor Thank you! If you really enjoyed this post, please don't miss to like & share!

  • What is SparkContext (Scala)?

    Main menu: Spark Scala Tutorial In this blog you will learn, How to start spark-shell? Understanding Spark-shell. Creating Spark context and spark configuration. Importing SparkContext and SparkConf. Writing simple SparkContext Scala program. Starting Spark-shell If you haven't installed Apache spark on your machine, refer this (Windows | Mac users) for installation steps. Apache Spark installation is very easy and shouldn't take long. Open your terminal and type the command spark-shell to start the shell. Same output, like what we did during Spark installation. $ spark-shell 19/07/27 11:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://19x.xxx.x.x5:4040 Spark context available as 'sc' (master = local[*], app id = local-1564252213176). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Type in expressions to have them evaluated. Type :help for more information. What is Spark-shell? Spark shell is an interactive shell through which you can access Spark APIs. Apache Spark has basically three sets of APIs (Application Program Interface) - RDDs, Datasets and DataFrames that allow developers to access the data and run various functions across four different languages - Java, Scala, Python and R. Don't worry, I will explain RDDs, Datasets and DataFrames shortly. Easy right? But.. I need to explain few facts before we proceed further. Refer the screen shot shown below. We usually ignore the fact that there is lot of information in this output. 1. First line of the Spark output is showing us a warning that it's unable to load native-hadoop library and it will use builtin-java classes where applicable. It's because I haven't installed hadoop libraries (which is fine..), and wherever applicable Spark will use built-in java classes. Output: 19/07/27 11:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". My point here is not the warning, but the WARN log level. Spark has various logging level which you can set while writing the program for example WARN, ALL, DEBUG, ERROR, INFO, FATAL, TRACE, TRACE_INT, OFF. By default Spark logging level is set to "WARN". 2. Next line is telling us how to adjust the logging level from default WARN to a newLevel. We will learn this later, how to run this piece of code sc.setLogLevel(newLevel). Its syntactically little different in various languages Scala, R, Java and Python. Output: To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 3. Next line is telling us the link for Spark UI, sometimes called as DAG scheduler. You can copy-paste that link in your local browser to open Spark user interface. By default, it runs at port number 4040. It would look like this. 4. Next line is telling us that SparkContext is created as "sc" and by default it's going to use all the local resources in order to execute the program master = local [*] with application id as local-1564252213176. Output: Spark context available as 'sc' (master = local[*], app id = local-1564252213176). 5. Spark session is created as 'spark'. We will see what is Spark session soon. 6. This line is telling us the Spark version, currently mine is 2.3.1. 7. We all know Java is needed to run Apache Spark, and same we did during installation. We installed Java first and then we installed Apache Spark. Here, the line is telling us the underlying Scala 2.11.8 and Java version 1.8.0_171. Output: Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) 8. You can run :help command for more information. Like this, Well, it's again a new story and I will write in detail how to use these commands soon. However, I have highlighted few common commands - like how can you see history of your commands and edit it, how you can quit spark-shell. Initializing Spark In last section we encountered few terms like Spark context (by default started as "sc") and Spark session (by default started as "spark"). If you run these commands one-by-one you will find the default setup and alphanumeric pointer locations (like @778c2e7c) to these Spark objects. It will be different on various machines, yours will be different from mine. For instance, scala> sc res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@778c2e7c scala> spark res1: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@16ccd2bc What is SparkContext? The first thing you do in Spark program is that you setup Spark context object. Why the first thing? This is because you need to tell Spark engine - How to run and what to run? It's like before ordering/ or buying a pizza, you need to tell whether you want a veg pizza or a non-veg pizza and the toppings ;). Spark context performs two major tasks (via Spark configuration - SparkConf ). It's not like these are the only two tasks but these are basic ones. First setMaster, it tells Spark engine how to run i.e. whether it should run in cluster mode (master) or local mode (local). We will see how to setup master i.e. Yarn, Mesos or Kubernetes cluster and standalone local mode shortly. Second setAppName, what to run i.e. the application name. So, basically Spark context tells Spark engine which application will run in which mode? How to Setup SparkContext? In order to define SparkContext, you need to configure it which is done via SparkConf. You need to tell Spark engine the application name and the run mode. 1. For this, we need to import two Spark classes, without these Spark will never understand our inputs. scala> import org.apache.spark.SparkContext import org.apache.spark.SparkContext scala> import org.apache.spark.SparkConf import org.apache.spark.SparkConf 2. Next, define configuration variable conf, first pass "Sample Application" name via setAppName method and second define the mode with setMaster method. I have setup conf to local mode with all [*] resources. scala> val conf = new SparkConf().setAppName("Sample Application").setMaster("local[*]") conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@c0013b8 You can see location (@c0013b8) of my configuration object. Spark engine can run either in standalone mode or cluster mode at one time, so at any given point of time you will have just one SparkContext. Confused? Wait I will explain soon. Try to create new SparkContext with above configuration. scala> new SparkContext(conf) You will get the error telling - one Spark context is already running. If you want to update SparkContext you need to stop() the default Spark context i.e. "sc" and re-define the Spark context with new configuration. I hope you all understood what does it mean when I said one active Spark context. Here is the complete reference from Apache documentation, what you can pass while setting up setMaster. Well, instead of doing all of above configuration. You can also change default SparkContext "sc" which we saw earlier. For this you need to pass the inputs with spark-shell command before you start the spark shell. $ spark-shell --master local[2] 19/07/27 14:33:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://19x.xxx.0.15:4040 Spark context available as 'sc' (master = local[2], app id = local-1564263216688). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Type in expressions to have them evaluated. Type :help for more information. Default setup was to utilize all local[*] cores (refer the output of first spark-shell command at the start of this post), now you can see it has changed to use local[2] cores. Creating SparkContext in Scala IDE example You can write similar program in Eclipse Scala IDE and run the sample application as follows. See How to run Scala IDE Copy-paste lines from here. package com.dataneb.spark import org.apache.spark.SparkContext import org.apache.spark.SparkConf object scExample { val conf = new SparkConf().setAppName("Sample Application").setMaster("local[4]") val sc = new SparkContext(conf) def main (args:Array[String]): Unit = { print("stopping sparkConext \n") sc.stop() } } Thats all guys! Please comment if you have any question regarding this post in comments section below. Thank you! Next: SparkContext Parallelize Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

Home   |   Contact Us

©2020 by Data Nebulae