This is our second installment of our Big Data Interview Questions and Answers webinar. Click for the first one. It’s always fun to host one of these webinars and especially it was fun hosting this one because the questions came from the Hadoop In Real World community. So these were real interview question asked in real interviews.
Spark stole the webinar in 2018
We did this webinar in Nov 2018 and our first webinar on interview question was on Nov 2017. Back in 2017 our community sent us a lot of Hadoop related questions to answer. In 2018, the focus was more on Spark.
We quite often hosts webinars like these, sign up below to get invitations to join one of our webinars.
List of Big Data interview questions that we answered in the webinar
How do you handle scenarios when Spark runs out of memory? (12:40)
How Spark performs operations and generate results when dataset doesn’t fit in memory? (12:40)
What do you do when one of your Spark jobs fails with OOM error? (12:40)
How do you handle slow running jobs in Spark? (28:40)
What do you do when one task takes lot of time in your Spark job while other completed in time? (28:40)
Tell us some of the Spark optimization techniques you used in your current project. (28:40)
How do you handle Spark streaming failures? (40:30)
What happens to Spark streaming when there is network failure during processing? (40:30)
How do you recover from Spark streaming failures? (40:33)
What is the difference between DataFrame and Dataset? (47:10)
When do you use DataFrame and when do you use Dataset? (47:10)
How do you properly remove Datanodes from your cluster? (52:30)