81 items found for ""

  • J1 Visa Waiver Application Process and Timeline (Stage 3)

    Stage 3 After receiving clearance form all the three (or two) office as mentioned in stage 2, the applicant needs to complete an online application for DS3035 with Department of State. Please see the link and site will guide you through the information needed. After you finish online application for DS3035, a case number will be generated at the end along with a PDF package (the PDF will consists of two sets). Now, applicant need to make two separate packages: Package 1: 1. Set one generated with the DS3035 package along with supporting documents (mentioned below). 2. Statement of reason (page generated with DS3035 package). 3. Passport copy of exchange visitors (first and last page). 4. Visa and latest I-94 copy. 5. Copies of all DS2019 issued. 6. Two self addressed stamped envelop. 7. Cashier's check of $120 (will be mentioned on the DS3035 PDF package). 8. Current address proof like copy of driving license, state ID etc. 9. Cover letter: It's totally optional. Applicant can briefly write a cover letter mentioning the case number, purpose of sending and list of documents included. Send the Packet to following address: Postal Service Department of State J-1Waiver P.O. Box 979037 St. Louis, MO 63197-9000 Courier Service Department of State J-1 Waiver P.O. Box 979037 1005 Convention Plaza St. Louis, MO 63101-1200 Package 2: 1. Set 2; i.e. third copy barcode page 2. copies of all NOC 3. Cashier check or money order of $25 + S2 (for using miscellaneous service). You can make for the total amount or separate. 4. One self addressed envelop; to receive the notice that your clearance is sent to the embassy. 5. Copies of passport front and back page, I-94, visa, driving license ( proof of my current address) and the cover letter received during stage 1. It was not mention for CGI-SF but I have included it with my package. 6. Biodata & affidavit from stage 1: However, I don't think that it is mandatory and I have not included it with the package but incase if applicant want they can include it. 7. Cover letter: It's totally optional. Applicant can briefly write a cover letter mentioning case number, the purpose of sending and list of documents included. Send the package to your respective consulate, it my case it was CGI-SF. Consulate General Of India 540 Arguello Blvd, San Francisco, CA 94118 Please write case number on the packages. Note: Please ensure to check your respective consulate pages for Stage 3 documents since it varies between all the consulates. What will happen next after applicant has posted both the packages: 1. The applicant will receive a copy recommendation of waiver addressed to Indian embassy in Washington (only if you have provided a self addressed stamped envelop), along with a copy of Third copy barcode page. The Indian embassy, Washington will forward your No objection statement to the waiver review division. According to my experience, you will also receive a copy of NOS (no objection statement) from CGI Washington DC. 2. After this you can check your status on DOS website using your case number. Your case number will become active and it will show up online. However, it might take 1 month or more for a case number to become active after the DOS has received documents. 3. The status mostly shows pending the review process starts when following documents are received (see below). It will appear online step by step on what documents they have received so please wait patiently. a. No objection from Indian embassy b. Fee c. DS3035 Form d. Copies of DS2019 e. Statement of reason f. Passport data pages 4. It will take up to 2-6 months and the status will then change to favorable recommendation, unless there is any denial to grant you waiver. 5. Once a recommendation has been made, your case will be electronically forwarded to U.S. Citizenship & Immigration Services (USCIS) Vermont Service Center (VSC) where your J-1 waiver eligibility will be determined. 6. Probably in 14-20 days you will receive I-797 notice of action from USCIS, which will also contain your receipt number and you can check your case status online using that receipt number. 7. Mostly after this within a week or two your case will be approved and form I-612 (waiver) will be sent out to the applicant by USCIS. It might take up to 2 weeks to receive the I-612 waiver. Congratulation!! You have received your waiver. Hurdle crossed :) I hope this blog will be helpful for people going through the painful waiver process. Please drop me message in the the comment section if you have any queries. I will get back to you ASAP.

  • J1 Visa Waiver Application Process and Timeline (Stage 2)

    I am post-doctoral research scientist with University of California, Riverside and I have recently gone through J1 visa waiver process. I am sharing my experience with everyone which I believe would help others to obtain J1 waiver easily. I referred several blogs and videos which was time consuming. I have divided the whole process into 4 sections: Introduction Stage 1 Stage 2 Stage 3 Previous Page: Stage 1 Stage 2 Your cover letter obtained in Stage 1 would say - NORI certificate will be issued after you obtain clearance from the following authorities. The Ministry of Human Resource and Development, New Delhi (MHRD): This is online process now, please follow this link and upload all required documents. Department of Home State in India: The state where you belong (address on passport). Regional passport office (RPO): The office from where your first passport was issued. These days RPO clearance is not required by some consulate and your cover letter should say it. In my case, it was not required. Please make sure this with your consulate. MHRD process is very easy and straight forward. Just follow the link and create your login ID, upload all the necessary document and site will guide you through. Once you submit everything it will take around 3-6 weeks to get the clearance. You will get this certificate by e-mail so please enter the correct email address. Please remember they do it in batches like 1-20 person in one batch so it might take some time. You can always check your status by login to the same site. Department of Home State is where you live in India, in my case it was Uttar Pradesh and this step is the tedious one. Document required: Cover Letter ( i.e. I wrote a letter briefly stating why I am sending them documents) Attested statement and Affidavit (bio-data and affidavit) Self-Attested copies of Passport & Visa Self-Attested copies of Certificates (PhD, M.Sc., B.Sc., and High School) DS-2019 Resume The document list might change based on Home Department of State but in general it should be more or less same. So, have all your supporting documents ready. You can either mail all your documents to the secretariat (where your home department is) or go there physically. If you have any family relatives/parents you can ask them to go there with your documents and submit it. It will take some time because you need to find the right person who handles NORI department since secretariat office have several departments. I would recommend you to go their physically or ask someone to visit on your behalf. Once your documents is accepted by secretariat, administration will forward it for police verification and District Magistrate (DM) office clearance. There will be police inquiry held to verify your address/documents. The police might pay a physical visit to your home address or call the phone number provided by you in bio-data/affidavit, or might ask your parents to visit police station, in my case it was my dad who got the call and went to police station. Then your parents have to submit an affidavit on your behalf stating that you do not have any loan/criminal activity against you and they (your parents) have no objection if you stay in US along with some other documents (ID proof etc., the cop will guide you through what documents are needed, this might vary depending upon State). Also, a separate file will be sent out to your respective District Magistrate (DM), they need to sign it and send it to secretariat office. Please follow up with both police office and DM office, it took 3-4 months for me end-to-end to get this clearance, also it was slower because of COVID-19 lockdown. Once secretariat will receive no objection from DM and Police then they will e-mail you your clearance certificate and simultaneously post it to Indian government, New Delhi. The Indian government will send one physical copy of certificate to your physical address in USA and another copy to Indian consulate office in US. Receiving physical copy might take 1-2 months, so be patient. In case you don't receive the physical copy then contact you consulate in USA (via email) to inquire if the email copy will work or not. One of my friend had the same issue (she never received physical copy) and CGI-SF confirmed that email (scan copy) will also work. Follow up at this step every time otherwise it will be much slower for you or they might forget your case. If possible take someone phone number at NORI office so that you can frequently contact them for the update. In my case a gentleman at Secretariat office was kind enough to provide me his number, further he guided me through all the process. Sample NOC from home state: Please drop me message in the the comment section if you have any queries. I will get back to you ASAP. Next Page: Stage 3

  • J1 Visa Waiver Application Process and Timeline (Stage 1)

    I am post-doctoral research scientist with University of California, Riverside and I have recently gone through J1 visa waiver process. I am sharing my experience with everyone which I believe would help others to obtain J1 waiver easily. I referred several blogs and videos which was time consuming. I have divided the whole process into 4 sections: Introduction Stage 1 Stage 2 Stage 3 Previous Page: Introduction Stage 1 Following are the steps for stage 1, it would max take 3-5 weeks based on your location and consulate. Download miscellaneous service form from your respective Consulate General of India website (links given above). You need one copy of this form and it should be hand-filled, paste a recent photograph on it. I have attached miscellaneous form which I used for CGI-SF. Download Waiver/NORI (No Objection Return to India) Certificate from your respective CGI website. It contains two sections - Bio-data and Affidavit. You need four copies of this form and it should be hand-filled. Also, make sure to use the latest forms, it might change in future. You can use this link to download the form for CGI-SF. I have attached it below also. Once hand-filled, NORI (bio-data & affidavit) should be notarized. You can get it notarized from bank or UPS. Check with your bank, they might do it for free. If you choose to go to UPS store you have to pay 25$ per document. I was not aware at that time and I ended up paying few hundred bucks to UPS store. Next, make two more Xerox copies of notarized NORI form. Now, you will have total 6 notarized NORI forms (4 original and 2 Xerox copies). A non-refundable fee of 66$ is required for Stage 1 and the fee should be paid through money order or cashiers check drawn in the favor of consulate general of India, San Francisco (in my case). Some CGI accepts cash too but that was not the case with CGI-SF. Additionally, ICWF charges (2$) will be applied for using miscellaneous service. So, you can prepare money order/cashiers check of 68$ or make it separate 66$ and 2$. In my case I prepared two cashiers check worth 66$ and 2$ both payable to drawn in the favor of consulate general of India, San Francisco. The fee might change depending on CGI location, so please double check before proceeding. Next, you need all the supporting documents (listed below) along with these forms and you have following two options. You can mail all the documents to your respective consulate general of India along with your original passport. Mailing address is available on CGI websites. If you want you can physically visit CGI office as well, in my case I went to CGI-SF since I was planning to travel to India and it was not possible for me to send my original passport by mail. You do not need any appointment for CGI-SF visit. In both the options (1 & 2) you have to provide them a return envelop with your address on it. It will take at-least 5-6 weeks to get the documents signed by Indian counselor officer and then it's posted to you using the return envelop you provided. However, this varies location to location. One of my friend went to CGI-NY and the Indian counselor officer signed bio-data and affidavit on the same day and she came back home with all the documents. I thought the same for CGI-SF but they took 5 weeks. Supporting documents: 1. Current Indian passport in original and photocopy of the first five pages and last two pages of the current passports. 2. Proof of your US Visa Status:(copy of any one of the following) Photocopy of the page containing visa on passport (H1-B, H-4 etc.), the copy of I-94 and photocopy of all DS-2019. Clear Photocopy of Green Card Employment Authorization Document (Work Permit) I-797, I-140 or I-20 (If approval copy of these notices are pending, also attach a handwritten note detailing the efforts being taken to regularize status) 3. Proof of current US residence address: (copy of any one of the following) U.S Driving license PG&E, Water or landline telephone bill displaying applicant’s address House Lease Agreement State Identification Card Note: Bank/credit card/mobile phone statements are not accepted as residence proof. For the updated forms and supporting documents list please check on your respective CGI website. The forms and guidelines might change with time. Congratulations Stage 1 is done! Along with all the signed documents you will also receive a cover letter from Indian Consulate officer, which will indicate from which offices in India you need to obtain clearance certificates. If you don’t get any cover letter, please email and ask you respective consulates. Cover Letter which I received from CGI-SF Please drop me message in the the comment section if you have any queries. I will get back to you ASAP. Next Page: Stage 2

  • J1 Visa Waiver Application Process and Timeline (Intro)

    I am post-doctoral research scientist with University of California, Riverside and I have recently gone through J1 visa waiver process. I am sharing my experience with everyone which I believe would help others to obtain J1 waiver easily. I referred several blogs and videos which was time consuming. The whole process is tedious and lot of people don’t know how the process. To reduce the complexity, I have divided the whole process into 4 sections: Introduction Stage 1 Stage 2 Stage 3 Before I begin I would like to clarify that these steps might change in future or slightly differ for you based on your location within United States. I applied it from Consulate Journal of India, San Francisco, California (CGI-SF) as I live in Riverside, CA. However, overall process should not vary much. What is J1 waiver? J1 visa is a non-immigrant visa granted to individuals who wants to participate in exchange visitors program in United States. Some individuals will be subject to 2 year home country physical presence requirement under which applicant has to go back to home country and serve for minimum 2 years. To waive off this two year rule, you need to obtain J1 waiver. How to know if you are subjected to 2 year home residency 212(e) rule? Please check your J1 Visa or DS 2019, anyone of them could state 212(e) rule. 1. Look at the bottom of your visa, it would say "Bearer is subject to section 212(e), Two year home residency rule does apply". 2. If you don't have it on your visa, please check your DS 2019. I don't have it on DS 2019 since it was on my visa. Refer this screenshot, you might see a check mark on option 2. When should you apply for J1 visa waiver? You can apply for waiver anytime, there is no restriction as such. But it's recommended to apply once you have received DS 2019 extension for the entire possible term (i.e. 5 years). There are basically 3 stages to get J1 waiver and end-to-end it takes 1-2 years to complete. May be less for some depending how fast you can obtain no objection statement from your home country. I would suggest you to complete first 2 stages when you have completed 3 years on your J1 visa and apply for third/final stage once you receive full 5 years of extension on J1 visa. I am suggesting that because once you apply for third/final stage of waiver and lets say it gets approved, you will not be able to extend J1/DS2019 anymore. So apply for third stage only when you have received full 5 years of extension on your visa. Few universities/institutes provide first DS 2019 with complete 5 years of period, but few of them provide yearly extension, and few of them provide two/three years extension. So you have to decide based on your situation. You can post your query in comment section below If you have still doubts, I will help you with this. Where should you apply? Please refer this link to see which Consulate General of India (CGI) office covers your state. Washington DC (details) Chicago (details) New York (details) San Francisco (details) Houston (details) Atlanta (details) I went in person to Consulate General of India, San Francisco. However, you can apply in person or send via mail. Next Page : Stage 1

  • How Families Can Keep Their Home Ready For Showings

    How Families Can Keep Their Home Ready For Showings Getting your house ready to go up on the market is always a big job. Add kids to the picture, however, and that job gets much bigger. You want to make a great presentation, which means keeping up with chaos that can go hand-in-hand with kids. Here’s a look at how families can keep their home looking great until the right buyers come along: Have A Plan It’s important to get everyone in the house on the same page when it comes to staging your home. This includes any children old enough to take on some tidying tasks. For the time your house is up for sale, upgrade everyone’s chore charts to reflect a few items off your staging checklist. This way you’re constantly keeping your home ready for buyers. This is super useful, since it allows you to host agents and house hunters at the drop of a hat. For practical purposes, keep the checklist handy and make sure everyone knows where it is. You can even share your list electronically with other adults in the household and kids who are old enough to use phones or tablets. If your agent lets you know they’re swinging by soon, anyone old enough can ensure that surfaces are wiped, shades are open, and personal items are safely stashed away. Use The Best Tools Keeping your house tidy all the time is a big task, but it can be made substantially easier with the right tools. For example, a good set of microfiber cloths can make quickly wiping up surfaces a breeze. A stick vacuum is another tool you’ll want at your disposal. Since these are more versatile and lighter than traditional vacuums, they make spot cleaning fast and easy. Go through all of your cleaning supplies and try to identify which are most useful for a quick, efficient clean up, and assemble a cleaning caddy so you can grab everything at once when you’re on the run. Remove Personal Items One of the most important things your family can do when it comes to staging your home is taking down décor that makes it look too lived in. Per Creative Home Stagers, this includes family photos, bold color schemes, and especially stylistic wall art or furniture. These personal touches may make you feel at home, but they’ll make potential buyers feel like they’re in someone else’s home. On one hand, this is true, but on the other, it can be a problem. Even if sellers aren’t thinking of it consciously, they’re trying to picture themselves in the space. Pictures of your family holiday party or child’s first steps will make it harder for them to imagine their life in the house. Plan Fun Outings – And A Backup – For Open Houses Although it may be tempting to try and scope out interested buyers, sellers should never be at an open house. In addition to being an even starker reminder that the home belongs to someone else, The Balance points out that your presence will put uncomfortable pressure on the buyers and make it harder for them to pay attention to the property. Instead, plan a fun outing with your family during the scheduled open houses. Head to a park, playground, or museum to pass a little time. If you have younger children, it might also be wise to find a friend or family member who will be willing to host you if your plans go south. You don’t want to show up to an open house at all, much less with a screaming toddler. Keeping a house market-ready with kids can be a challenge, but don’t be intimidated. Prep your home and family, and make arrangements for showings and open house events. With a plan under your belt, there’s nothing stopping you from keeping your house buyer-ready until that magic day it’s sold! Photo Credit: Pexels

  • 6 Budget-Friendly Ways to Prepare for Your Pregnancy (checklist)

    Every pregnancy is different, and that is true even in the same person. Your first pregnancy might have been plagued with morning sickness, high blood pressure and lower back pain, while in your second pregnancy you hardly felt a thing. That can make pregnancy preparation tricky — not knowing what to expect can be hard on your mood and your finances. Many pregnant women enjoy feeling their new child growing and developing, but in those times of discomfort, it’s important to have a plan to manage physical and mental stress. Here are a few budget-friendly tips to help you with sound and solid pregnancy prep. 1. Before and after clothes When you think about buying maternity clothes do you just cringe at the cost, knowing you’ll only have to wear this size for a short period of time? There are actually ways to cut costs when it comes to pregnancy wear. First, consider buying a belly band so you can transform the pants you currently wear into pregnancy and postpartum pants. Second, look into comfortable nursing pajamas (you can find a pair for $33.99) that you can fit into now and after the baby comes. The more cozy and flowy they are, the more comfortable you’ll be during some of those long, late night nursing marathons. 2. Amazon’s “Subscribe and Save” You should have bought stock in antacids with the kind of heartburn you are experiencing. Now it’s 3 am and you can’t sleep and you are out of Tums. You can save time and money by subscribing to items you use a lot. Not only will these be automatically delivered to your home so you never have to experience late night heartburn unaided again, but the cost per item is often reduced when you subscribe. You can do this with other items, like foods you have been craving, shea butter to help reduce stretch marks or hemorrhoid cream for sore bottoms. 3. Putting together a nursery Putting together a warm and comfortable nursery is important for mother and baby. Since you and your newborn will spend a lot of time there, you want it to be as nurturing as possible. And while you might be tempted to go overboard with the decor, it’s important to focus only on the basics so you can stay within budget. Also, while you might be tempted to do everything yourself, don’t tackle any projects that you feel are out of your wheelhouse. Fortunately, in Minneapolis, you can hire a handyman for an assortment of small jobs for an average of $403 per project, depending on the size of the project. And although that might sound like a lot of money, you’ll rest assured knowing that the tasks were completed by a professional. 4. Children’s consignment stores While primarily an ideal spot to find good deals on gently used clothes, toys, furniture and bedding, you can also find steep discounts on used maternity and postpartum accessories. You can find breast pumps and parts, breastfeeding pillows and other nursing items. And the e-commerce boom has also helped increase access to quality used pre- and postpartum clothes. You can even rent high end used maternity and nursing clothes. Browse online and have them delivered right to your door. 5. Explore Coupons and Groupons The big box retailers love a pregnant woman — families are very profitable to stores that sell food, clothing, home goods and furniture. They will be looking to entice you into the store by offering coupons and discounts on maternity and baby items. Take advantage of these discounts! And don’t just look there; websites that offers discounts, like Groupon, also often have a section with items to help you plan and prepare for a baby. And don’t forget about stores like Sam’s Club and Costco. After you pay their membership fee, you get access to bulk and wholesale items with steep discounts. In fact, consider adding a membership to one of those stores to your baby registry. 6. Facebook groups for new moms Social media is a place where we can build community. Of course, anyone watching the news knows social media has a dark side, but there are also opportunities to find and make real connections. Look for mom groups out there in your area. There are often breastfeeding groups, buy-sale-trade groups, baby-wearing groups and other mom-themed groups in many cities. More importantly than being able to purchase used items, you are able to ask questions, get advice and provide — and receive— support. Pregnancy is going to be a time of discovery, even for those on their second child or beyond. Give yourself space to breathe easier by setting a budget and staying within that budget. And don’t forget to lean on your community as much as you can for support.

  • Installing Spark on Windows (pyspark)

    Prerequisite: Follow these steps to install Apache Spark on windows machine. Now-a-days Python is used by many applications. So it is quite possible that Python is already available on your machine. To check, just run this command on your command prompt. C:\Users\rajar> python --version 'python' is not recognized as an internal or external command, operable program or batch file. If Python is present on your computer, command will output the Python version like this. Python x.x.x Check if Java is properly installed, just run java -version and you should be able to see Java version running on your computer. Download & Install Python Go to Python download page and download the latest version (don't download Python 2). Download 64 bit or 32 bit installer depending upon your system configuration. Double click on the downloaded executable file. Don't forget to check the box - Add Python 3.7 to PATH , then click Install now. Thats all, it will take couple of minutes to complete the installation. Now test it, run previous command again and you should be able to see Python version this time. C:\Users\rajar> python --version Python 3.7.4 Run pyspark Now, run the command pyspark and you should be able to see the Spark version. If you have any question please mention in comments section below and I will help you out with installation process. Thank you. Next: Just enough Scala for Spark Navigation menu ​ 1. Apache Spark and Scala Installation 1.1 Spark installation on Windows​ 1.2 Spark installation on Mac 2. Getting Familiar with Scala IDE 2.1 Hello World with Scala IDE​ 3. Spark data structure basics 3.1 Spark RDD Transformations and Actions example 4. Spark Shell 4.1 Starting Spark shell with SparkContext example​ 5. Reading data files in Spark 5.1 SparkContext Parallelize and read textFile method 5.2 Loading JSON file using Spark Scala 5.3 Loading TEXT file using Spark Scala 5.4 How to convert RDD to dataframe? 6. Writing data files in Spark ​6.1 How to write single CSV file in Spark 7. Spark streaming 7.1 Word count example Scala 7.2 Analyzing Twitter texts 8. Sample Big Data Architecture with Apache Spark 9. What's Artificial Intelligence, Machine Learning, Deep Learning, Predictive Analytics, Data Science? 10. Spark Interview Questions and Answers

  • Kafka Consumer Advance (Java example)

    Prerequisite Kafka Overview Kafka Producer & Consumer Commits and Offset in Kafka Consumer Once client commits the message, Kafka marks the message "deleted" for the consumer and hence the read message would be available in next poll by the client. Properties used in the below example bootstrap.servers=localhost:9092 ProducerConfig.RETRIES_CONFIG=0 value.deserializer=org.apache.kafka.common.serialization.StringDeserializer key.deserializer=org.apache.kafka.common.serialization.StringDeserializer retries=0 group.id=group1 HQ_TOPIC_NAME=EK.TWEETS.TOPIC CONSUMER_TIMEOUT=1000 worker.thread.count=5 counsumer.count=3 auto.offset.reset=earliest enable.auto.commit=false Configuration Level Setting This can be done at configuration level in the properties files. auto.commit.offset=false - This is the default setting. Means the consumer API can take the decision to retail the message of the offset or commit it. auto.commit.offset=true - Once the message is consumed by the consumer, the offset is committed if consumer API is not taking any decision in client code. Consumer API Level Setting Synchronous Commit Offset is committed as soon consumer API confirms. The latest Offset of the message is committed. Below example is committing the message after processing all messages of the current polling. Synchronous commit blocks until the broker responds to the commit request. Sample Code public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } consumer.subscribe(Collections.singletonList(getTopicName())); while (true) { ConsumerRecords records = consumer.poll(10000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitSync(); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Asynchronous Commit The consumer does not wait for the the response from the broker This commits just confirms the broker and continue its processing. Throughput is more in compare to Synchronous commit. There could be chances of duplicate read, that application need to handle its own. Sample code while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of messaged polled by consumer "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } consumer.commitAsync(new OffsetCommitCallback() { public void onComplete(Map offsets, Exception exception) { if (exception != null){ System.out.printf("Commit failed for offsets {}", offsets, exception); }else{ System.out.println("Messages are Committed Asynchronously..."); } }}); } Offset Level Commit Sometime application may need to commit the offset on read of particular offset. Sample Code Map currentOffsets =new HashMap(); while (true) { ConsumerRecords records = consumer.poll(1000L); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); if(record.offset()==18098){ consumer.commitAsync(currentOffsets, null); } } } Retention of Message Kafka retains the message till the retention period defined in the configuration. It can be defined at broker level or at topic level. Retention of message can be on time basis or byte basis for the topic. Retention defined on Topic level override the retention defined at broker level. retention.bytes - The amount of messages, in bytes, to retain for this topic. retention.ms - How long messages should be retained for this topic, in milliseconds. 1. Defining retention at topic level Retention for the topic named “test-topic” to 1 hour (3,600,000 ms): # kafka-configs.sh --zookeeper localhost:2181/kafka-cluster --alter --entity-type topics --entity-name test-topic --add-config retention.ms=3600000 2. Defining retention at broker level Define one of the below properties in server.properties # Configures retention time in milliseconds => log.retention.ms=1680000 # Configures retention time in minutes => log.retention.minutes=1680 # Configures retention time in hours => log.retention.hours=168 Fetching Message From A Specific Offset Consumer can go down before committing the message and subsequently there can be message loss. Since Kafka broker has capability to retain the message for long time. Consumer can point to specific offset to get the message. Consumer can go back from current offset to particular offset or can start polling the message from beginning. Sample Code Map currentOffsets =new HashMap(); public synchronized void subscribeMessage(String configPropsFile)throws Exception{ try{ if(consumer==null){ consumer =(KafkaConsumer) getKafkaConnection(configPropsFile); System.out.println("Kafka Connection created...on TOPIC : "+getTopicName()); } TopicPartition topicPartition = new TopicPartition(getTopicName(), 0); List topics = Arrays.asList(topicPartition); consumer.assign(topics); consumer.seekToEnd(topics); long current = consumer.position(topicPartition); consumer.seek(topicPartition, current-10); System.out.println("Topic partitions are "+consumer.assignment()); while (true) { ConsumerRecords records = consumer.poll(10000L); System.out.println("Number of record polled "+records.count()); for (ConsumerRecord record : records) { System.out.printf("Received Message topic =%s, partition =%s, offset = %d, key = %s, value = %s\n", record.topic(), record.partition(), record.offset(), record.key(), record.value()); currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()+1, "no metadata")); } consumer.commitAsync(currentOffsets, null); } }catch(Exception e){ e.printStackTrace(); consumer.close(); } } Thank you. If you have any doubt please feel free to post your questions in comments section below. [23/09/2019 04:38 PM CST - Reviewed by: PriSin]

  • Apache Spark Interview Questions

    This post include Big Data Spark Interview Questions and Answers for experienced and beginners. If you are a beginner don't worry, answers are explained in detail. These are very frequently asked Data Engineer Interview Questions which will help you to crack big data job interview. What is Apache Spark? According to Spark documentation, Apache Spark is a fast and general-purpose in-memory cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. In simple terms, Spark is a distributed data processing engine which supports programming language like Java, Scala, Python and R. In core, Spark engine has four built-in libraries which supports Spark SQL, Machine Learning, Spark Streaming and GraphX. What is Apache Spark used for? Apache Spark is used for real time data processing. Implementing Extract, Transform, Load (ETL) processes. Implementing machine learning algorithms and create interactive dashboards for data analytics. Apache Spark is also used to store petabytes of data with data distributed over cluster with thousands of nodes. How does Apache Spark work? Spark uses master-slave architecture to distribute data across worker nodes and process them in parallel. Just like mapreduce, Spark has a central coordinator called driver and rest worker nodes as executors. Driver communicates with the executors to process the data. Why is Spark faster than Hadoop mapreduce? One of the drawbacks of Hadoop mapreduce is that it holds full data into HDFS after running each mapper and reducer job. This is very expensive because it consumes lot of disk I/O and network I/O. While in Spark, there are two processes transformations and actions. Spark doesn't write or hold the data in memory until an action is called. Thus, it reduces disk I/O and network I/O. Another innovation is in-memory caching where you can instruct Spark to hold input data in-memory so that program doesn't have to read data again from disk, thus reducing disk I/O. Is Hadoop required for spark? No, Hadoop file system is not required for Spark. However for better performance, Spark can use HDFS-YARN if required. Is Spark built on top of Hadoop? No. Spark is totally independent of Hadoop. What is Spark API? Apache Spark has basically three sets of APIs (Application Program Interface) - RDDs, Datasets and DataFrames that allow developers to access the data and run various functions across four different languages - Java, Scala, Python and R. What is Spark RDD? Resilient Distributed Datasets (RDDs) are basically an immutable collection of elements which is used as fundamental data structure in Apache Spark. These are logically partitioned data across thousands of nodes in your cluster that can be accessed and computed in parallel. RDD was the primary Spark API since Apache Spark foundation. Which are the methods to create RDD in spark? There are mainly two methods to create RDD. Parallelizing - sc.parallelize() Reference external dataset - sc.textFile() Read - Spark context parallelize and reference external dataset example. When would you use Spark RDD? RDDs are used for unstructured data like streams of media texts, when schema and columnar format of data is not mandatory requirement like accessing data by column name and any other tabular attributes. Secondly, RRDs are used when you want full control over physical distribution of data. What is SparkContext, SQLContext, SparkSession and SparkConf? SparkContext tells Spark driver application whether to access the cluster through a resource manager or to run locally in standalone mode. The resource manager can be YARN, or Spark's cluster manager. SparkConf stores configuration parameters that Spark driver application passes to SparkContext. These parameters define properties of Spark driver application which is used by Spark to allocate resources on the cluster. Such as the number, memory size and cores used by the executors running on the worker nodes. SQLContext is a class which is used to implement Spark SQL. You need to create SparkConf and SparkContext first in order to implement SQLContext. It is basically used for structured data when you want to implement schema and run SQL. All three - SparkContext, SparkConf and SQLContext are encapsulated within SparkSession. In newer version you can directly implement Spark SQL with SparkSession. What is Spark checkpointing? Spark checkpointing is a process that saves the actual intermediate RDD data to a reliable distributed file system. It's the process of saving intermediate stage of a RDD lineage. You can do it by calling checkpoint, RDD.checkpoint() while developing the Spark driver application. You need to set up checkpoint directory where Spark can store these intermediate RDDs by calling RDD.setCheckpointDir(). What is an action in Spark and what happens when it's executed? Action triggers execution of RDD lineage graph, loads original data from disk to create intermediate RDDs, performs all transformations and returns the final output to the Spark driver program or writes the data to file system (based on the type of action). According to Spark documentation, following are the list of actions. What is Spark Streaming? Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams. Reference: Apache Spark documentation Login to see more; What is difference between RDDs, DataFrame and dataset? Why is spark RDD immutable? Are spark DataFrames immutable? Are spark DataFrames distributed? What is Spark stage? How does SQL spark work? What is spark executor and how does it work? How will you improve Apache Spark performance? What is spark SQL Warehouse Dir? What is Spark shell? How would you open and close it? How will you clear the screen on spark shell? What is parallelize in spark? Does spark SQL require hive? What is Metastore in hive? What does repartition and coalesce do in spark? What is spark mapPartitions? What is difference between MAP and flatMap in spark? What is spark reduceByKey? What is lazy evaluation in spark? What is accumulator in spark? Can RDD be shared between SparkContexts? What happens if RDD partition is lost due to worker node failure? Which serialization libraries are supported in spark? What is cluster manager in spark? Questions? Feel free to write in comments section below. Thank you.

  • Apache Avro Schema Example (in Java)

    Introduction Avro provides data serialization based on JSON Schema. It is language neutral data serialization system, means a language A can serialize and languages B can de-serialize and use it. Avro supports both dynamic and static types as per requirement. It supports many languages like Java,C, C++, C#, Python and Ruby. Benefits Producers and consumers are decoupled from their change in application. Schemas help future proof your data and make it more robust. Supports and used in all use cases in streaming specially in Kafka. Avro are compact and fast for streaming. Supports for schema registry in case of Kafka. Steps to Serialize Object Create JSON schema. Compile the schema in the application. Populate the schema with data. Serialize data using Avro serializer. Steps to Deserialize Object Use Apache Avro api to read the serialized file. Populate the schema from file. Use the object for application. Sample Example for Avro (in Java) Step-1: Create a Java project and add the dependencies as below. Step-2: Create a Schema file as below: Customer_v0.avsc { "namespace": "com.demo.avro", "type": "record", "name": "Customer", "fields": [ { "name": "id", "type": "int" }, { "name": "name", "type": "string" }, { "name": "faxNumber", "type": [ "null", "string" ], "default": "null" } ] } Step-3: Compile the schema. java -jar lib\avro-tools-1.8.1.jar compile schema schema\Customer_v0.avsc schema Step-4: Put the java generated file to the source directory of the project as shown in project structure. Step-5: Create the Producer.java package com.demo.producer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileWriter; import org.apache.avro.io.DatumWriter; import org.apache.avro.specific.SpecificDatumWriter; import com.demo.avro.Customer; public class Producer { public static void main(String[] args)throws IOException { serailizeMessage(); } public static void serailizeMessage()throws IOException{ DatumWriter datumWriter = new SpecificDatumWriter(Customer.class); DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); File file = new File("customer.avro"); Customer customer=new Customer(); dataFileWriter.create(customer.getSchema(), file); customer.setId(1001); customer.setName("Customer -1"); customer.setFaxNumber("284747384343333".subSequence(0, 10)); dataFileWriter.append(customer); customer=new Customer(); customer.setId(1002); customer.setName("Customer -2"); customer.setFaxNumber("45454747384343333".subSequence(0, 10)); dataFileWriter.append(customer); dataFileWriter.close(); } } Step-6: Create the Consumer.java package com.demo.consumer; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileReader; import org.apache.avro.io.DatumReader; import org.apache.avro.specific.SpecificDatumReader; import com.demo.avro.Customer; public class Consumer { public static void main(String[] args)throws IOException { deSerailizeMessage(); } public static void deSerailizeMessage()throws IOException{ File file = new File("customer.avro"); DatumReader datumReader = new SpecificDatumReader(Customer.class); DataFileReader dataFileReader= new DataFileReader(file,datumReader); Customer customer=null; while(dataFileReader.hasNext()){ customer=dataFileReader.next(customer); System.out.println(customer); } } } Step-7: Run Producer.java It creates customer.avro file and puts the customer in Avro format. Step-8: Run Consumer.java It reads the customer.avro file and get the customer records. Thank you! If you have any question please mention in comments section below. [12/09/2019 10:38 PM CST - Reviewed by: PriSin]