top of page

Results found for ""

  • How to make calls to Twitter APIs using Postman client?

    In this blog, I am going to invoke Twitter custom APIs with Postman client in order to pull live feeds, or you can say tweets from Twitter. Output will be JSON text which you can format or change based on your requirement. Soon I will be writing another blog to demonstrate how you can ingest this data in real time with Kafka and process it using Spark. Or, you can directly stream & process the data in real time with Spark streaming. As of now, let's try to connect Twitter API using Postman. Prerequisites Postman client Twitter developer account Postman Client Installation There are basically two ways to install Postman, either you can download the Postman extension for your browser (chrome in my case) or you can simply install native Postman application. I have installed Postman application to write this blog. Step 1. Google "Install Postman" and go to the Postman official site to download the application. Step 2. After opening Postman download link, select your operating system to start Postman download.​​​​​ It's available for all the types of platform - Mac, Linux and Windows. The download link keeps on changing so if the download link doesn't work just Google it as shown above. Step 3. Once installer is downloaded, run the installer to complete the installation process. It's approximately 250 MB application (for Mac). Step 4. Sign up.​​ After signing in, you can save your preferences or do it later as shown below. Step 5. Your workspace will look like below. Twitter Developer Account I hope you all have Twitter developers account, if not please create it. Then, go to Developer Twitter and sign in with your Twitter account. Click on Apps > Create an app at the top right corner of your screen. Note: Earlier, developer.twitter.com was known as apps.twitter.com. Fill out the form to create an application > specify Name, Description and Website details as shown below. This screen has slightly changed with new Twitter developer interface but overall process is still similar. If you have any question, please feel free to ask in comment section at the end of this post. Please provide a proper website name like https://example.com otherwise you will get error while creating the application. Sample has been shown above. Once you successfully create the app, you will get the below page. ​Make sure access level is set to Read and Write as shown above. Now go to Keys and Access Token tab > click on Create Access Token. At this point, you will be able to see 4 keys which will used in Postman client. Consumer Key (API Key) Consumer Secret (API Secret) Access Token Access Token Secret. New Interface looks like this. Calling Twitter API with Postman Client Open Postman application and click on authorization tab. Select authorization type as OAuth 1.0. Add authorization data to Request Headers. This is very important step else you will get error. After setting up authorization type and request header, fill out the form carefully with 4 keys (just copy-paste) which we generated in Twitter App - Consumer Key (API Key), Consumer Secret (API Secret), Access Token & Access Token Secret.​ Execute it! Now let's search for tweeter statuses which says snap. Copy-paste request URL as https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=snap as shown below. You can refer API reference index in order to access various Twitter custom API. GET some tweets, hit Send button. You will get response as shown below. GET Examples Twitter has very nice API documentation on accounts, users, tweets, media, trend, messages, geo, ads etc and there is huge variety of data which you can pull. I am invoking few APIs just for demonstration purpose. Accounts and users Lets say you want to search for user name "Elon". You can do it like this, GET https://api.twitter.com/1.1/users/search.json?q=elon Now suppose you want to get friend list of Elon Musk, you can do it like this, GET https://api.twitter.com/1.1/friends/list.json?user_id=44196397 Input user_id is same as id in previous output. You can also change the display => pretty, raw and preview. Trending Topics You can pull top 50 trending global topics with id = 1, for example, GET https://api.twitter.com/1.1/trends/place.json?id=1 POST Examples You can also POST something like you Tweet in your Twitter web account. For example if you want to Tweet Hello you can do it like this, POST https://api.twitter.com/1.1/statuses/update.json?status=Hello You can verify same with your Twitter account, yeah that's me! I rarely use Twitter. Cursoring Cursoring is used for pagination when you have large result set. Lets say you want to pull all statuses which says "Elon", it's obvious that there will be good number of tweets and that response can't fit in one page. To navigate through each page cursoring is needed. For example, lets say you want to pull 5 result per page you can do it like this, GET https://api.twitter.com/1.1/search/tweets.json?q=Elon&count=5 Now, to navigate to next 5 records you have to use next_results shown in search_metadata section above like this, GET https://api.twitter.com/1.1/search/tweets.json?max_id=1160404261450244095&q=Elon&count=5&include_entities=1 To get next set of results again use next_results from search_metadata of this result set and so on.. Now, obviously you can't do this manually each time. You need to write loop to get the result set programmatically, for example, cursor = -1 api_path = "https://api.twitter.com/1.1/endpoint.json?screen_name=targetUser" do { url_with_cursor = api_path + "&cursor=" + cursor response_dictionary = perform_http_get_request_for_url( url_with_cursor ) cursor = response_dictionary[ 'next_cursor' ] } while ( cursor != 0 ) In our case next_results is like next_cursor, like a pointer to next page. This might be different for different endpoints like tweets, users and accounts, ads etc. But logic will be same to loop through each result set. Refer this for complete details. That's it you have successfully pulled data from Twitter. #TwitterAPI #Postmaninstallation #Oauth #API #CustomerKey #CustomerSecret #accesstoken #Postman Next: Analyze Twitter Tweets using Apache Spark Learn Apache Spark in 7 days, start today! ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • How to Install OEL (Oracle Enterprise Linux) on Virtualbox?

    This blog article focuses on configuring Virtualbox to create a new VM for Oracle Linux 7 (OEL) and installing Oracle Linux 7 from the ISO image as a guest operating system. The Oracle Linux 7 ISO image can be downloaded from edelivery.oracle.com – Oracle Software Delivery Cloud. You must have a valid Oracle Account (free) to download the Linux ISO. Link: edelivery.oracle.com (downloaded platform x86 64 bit for this post, check your system configuration before you download) Configuring Virtual Box To create a VM, click on the New button on the top left corner and provide a descriptive name for the VM, type of OS and Version. The name that is specified will be used to identify the VM configuration: Once the required information is provided, click on the Next button​. Specify how much RAM you want to designate to your virtual machine. I have allocated 20 GB. Click on Next button, lets keep Hard disk as default size. Hit Create button, now keep Hard disk file type as VDI. We can change this later. Hit Next button, keep storage as dynamic allocation. Hit Next button and configure file location and size. For this post lets go with 20 GB. Hit Create button. Once the VDI disk is created with the specified size, you will be re-directed back to the main screen. You will be able to see all the details as shown below: Go to Settings and select Network tab, change configuration to "Bridged Adapter" and Promiscuous Mode to "Allow All" as shown below: Go to System tab and assign number of processors. I have selected 4 processors for this example: Now go to Storage tab and click on disk image to choose Virtual Optical Disk ISO image file which you downloaded initially from Oracle Cloud. Click on OK and now you will be able to see Optical Drive ISO image file as highlighted below (it was empty before, refer old previous screenshots). Ignore the name as it depends on which ISO image you download from Oracle cloud. Installing Oracle Linux From the Oracle VM VirtualBox Manager screen, select your VM that you just created and click on the green Start icon on the top of the screen. Select the language as English (United States). Hit Continue button. Now select Installation Destination as Local Standard Disks. Now go back and select Network & Host Name (currently it's shown as not connected). Turn ON the connection. Now setup the ROOT password: Create another User if you need. For instance I have created Hadoop user: Let post-installation setup tasks finish. Once it's done, you can click on the Software Selection button and choose to install a “Server with GUI” option and choose the KDE desktop. Click on the Done button after you select all the options that you want installed. From the Configuration screen, click on the Reboot button. Once the VM reboots, you will be directed to the login prompt. Since we did a bare minimal installation, we will not enter into a GUI mode. Login with root user and password which we setup earlier. Install net-tools package, Type: yum install net-tools Once net-tools installs, Type: ifconfig inet shown above will tell your IP address. That's it !!! To check your Linux version, Type: uname -a , you will get something like below: [root@localhost anonymous]# uname -a Linux localhost.localdomain 4.1.12-112.16.4.el7uek.x86_64 #2 SMP Mon Mar 12 23:57:12 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux If you have any question please mention it in the comments section below. Thank you! #RedhatLinux #OracleLinuxInstallation #Installing #Oracle #Linux #on #Virtual #Box

  • Installing Java on Oracle Linux

    Referenced from www.java.com (added few additional steps in order to make installation process more perfect) Java for Linux Platforms 1. First check if Java is already installed on your machine, Type java -version, or simply run this command on your terminal: which java 2. If Java is not present, your terminal will not understand this command and it will say command not found. 3. Now to install Java, change to the directory in which you want to install. Type: cd directory_path_name For example, to install the software in the /usr/java/ directory, Type: cd /usr/java/ 4. Download the tarball Java file from www.java.com (snippet shown above). ​ 5. Get 32 bit or 64 bit tarball file depending upon on your Linux machine configuration. 6. Move (sftp) the .tar.gz archive binary to the current directory /usr/java/. 7. Unpack the tarball and install Java tar zxvf jre-8u73-linux-i586.tar.gz In this example, it is installed in the /usr/java/jre1.8.0_73 directory. You can remove the version detail and rename the file according to your convenience. 8. Delete .tar.gz file if you want to save some disk space. 9. Setup .bashrc file. Type: vi ~/.bashrc and enter these two lines in the file; export JAVA_HOME=/usr/java/jre1.8.0_73 export PATH=$PATH:$JAVA_HOME/bin 12. Now, run source ~/.bashrc Now type command: java -version in order to see if java is successfully installed or not. If it's not running find bin directory where you unzipped Java and run: /path_to_your_Java/bin/java -version Java for RPM based Linux Platforms Become root by running su and entering the super-user password. Uninstall any earlier installations of the Java packages. rpm -e package_name Change to the directory in which you want to install. Type: cd directory_path_name For example, to install the software in the /usr/java/ directory, Type: cd /usr/java Install the package. rpm -ivh jre-8u73-linux-i586.rpm To upgrade a package, Type: rpm -Uvh jre-8u73-linux-i586.rpm Exit the root shell. No need to reboot. Delete the .rpm file if you want to save disk space. If you have any question, please write in comments section below. Thank you! #Javainstallation #OracleLinux #OEL #OL

  • What is Big Data Architecture? Ingest, Transform/Enrich and Publish

    What is Big Data Architecture? How to define Big Data Architecture? What is the need of Big Data Architecture? What are various ways to ingest, transform, enrich & publish Big Data? There are several questions around it. Let's explore few sample big data architectures. Main menu: Spark Scala Tutorial With glittering new tools in the market and myriad buzzwords surrounding data operations, consumers of information often overlook the building process, believing insights gleaned from data are instantaneous and automated. We live in the “pre-AI” age where clear answers to qualitative questions from quantitative analysis requires human intervention. Yes, advanced data science gives us extensive means to visualize and cross-section, but human beings are still needed to ask questions in a logical fashion and find the significance of resulting insights. Please note above architecture is just a sample architecture which varies depending upon nature of data and client requirement. We will discuss it in detail shortly. What is the need of Big Data Architecture? A big data architecture is designed to handle the ingestion, enrichment & processing of raw structured, semi-structured and unstructured data that is too large or complex for traditional database systems or traditional data warehousing system. The three V's - volume, velocity & variety - are the most common properties of Big Data architecture. Whether we end up electing well known Kappa or Lambda architecture of big data architecture - understanding of three V's and nature of data plays a very crucial role in our big data architecture. For instance, If the velocity of data is very low or volume is very low why don't we go with traditional database systems? Instead I have seen organization rushing towards transforming their traditional data warehouse systems into big data architectures because it's shinning in market. Let's categorize Big Data Architecture Workloads Real-time processing, data sources like IoT devices - I would rather say it's "near" real time processing (Ingestion & Enrichment will take few seconds). If the solution includes real-time sources, the architecture must include a way to capture and store real-time messages for stream processing. Usually these streams are carried out using Apache Kafka & Zookeeper pair, Amazon Simple Queue Service (SQS), JBoss, RabbitMQ, IBM Websphere MQ, Microsoft Messaging Queue etc. Kappa architecture is famous for this type of workload. Batch processing - Because the data sets are so large, often a big data architecture solution must process data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually these jobs involve reading source files, processing them, and writing the output to new files. Options include running U-SQL (Unstructured SQL) jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster. Lambda architecture is generally used for this. Machine learning & Predictive analytics - A common misconception is that predictive analytics and machine learning are the same thing. This is not the case. At its core, predictive analytics encompasses a variety of statistical techniques (including machine learning, predictive modeling and data mining) and uses statistics (both historical and current) to estimate, or ‘predict’, future outcomes. Machine learning, on the other hand, is a sub-field of computer science that gives ‘computers the ability to learn without being explicitly programmed’. Machine learning evolved from the study of pattern recognition and explores the notion that algorithms can learn from and make predictions on data. And, as they begin to become more ‘intelligent’, these algorithms can overcome program instructions to make highly accurate, data-driven decisions. R, Python & Scala are popular languages to work with these workloads. Big Data Architecture backbone: Data refinement is the key! Serious data scientists need to make data refinement their first priority, and break down the data work into three steps: Data Ingestion, or call it Data Collection layer - People use different terminologies for the first layer. However main focus of this layer is to choose right technology depending upon Big Data architecture workload and project requirement. If requirement demands real-time processing we can use Kafka or any other real time MQ systems mentioned earlier. If source is just a flat file which is generated few times a day, go with simple file transfer protocol. At the end, don't forget money and third party vendor technology support matters as well. Data Enrichment, Transformation, Processing & Refinement - To be instrumentally useful, data must be converted into “answers” to questions. In other words, Big Data must get smaller after passing second layer. Don't pile up raw data which is not in question as it's going to dramatically slow down your process over a period. Data Publish, or Delivery so called the Presentation Layer - Deliver the answers through optimized channels in proper formats and frequency. This layer includes reporting, visualization, data exploration, ad-hoc querying and export datasets. Visualization through Tableau, QlikView etc, reporting through BOBJ, SSRS etc , ad-hoc querying using Hive, Impala, Spark SQL etc. Further, choice of technology depend upon end users - different users like administrator, business users, vendor, partners etc. demand data in different format. Data storage : Last but not least! Hadoop distributed file system is the most commonly used storage framework in Big Data architecture, others are the NoSQL data stores – MongoDB, HBase, Cassandra etc. One of the salient features of Hadoop storage is its capability to scale, self-manage and self-heal. Things to consider while planning storage methodology: Type of data (historical or incremental) Format of data ( structured, semi-structured and unstructured) Analytical requirement that storage can support (synchronous & asynchronous) Compression requirements Frequency of incoming data Query pattern on the data Consumers of the data Thank you! If you have any question please don't forget to mention in comments section below. Next: What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

bottom of page