PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Latest

Confluent Hub: A Central Repo For Kafka Connect

Tim Berglund announces Confluent Hub:Connect has been an integral part of Apache Kafka since version 0.9, released late 2015. It has proved to be an effective framework for streaming data in and out of Kafka from nearby systems like relational databases, Amazon S3, HDFS clusters, and even nonstandard legacy systems that typically show themselves in ...

Tuning Spark Jobs Running On YARN

Anushree Subramaniam gives us a primer on Apache YARN, the resource manager which drives Hadoop: In Hadoop version 1.0 which is also referred to as MRV1(MapReduce Version 1), MapReduce performed both processing and resource management functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocated the resources, performed scheduling ...

YARN Fundamentals

Anushree Subramaniam gives us a primer on Apache YARN, the resource manager which drives Hadoop:In Hadoop version 1.0 which is also referred to as MRV1(MapReduce Version 1), MapReduce performed both processing and resource management functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocated the resources, performed scheduling and ...

What happens if a HDFS block is deleted directly from dataNode ?

Lately I wondered what happens if I login into one of the data nodes and delete a HDFS block directly from the filesystem, not via hdfs interface ? If we have a replica factor of 3, then 2 other copies of this block are still available so Hadoop can: Keep serving this block to requestors. Recover the missing ...

Honored to Receive the SIGMOD Systems Award for Apache Hive

Qubole co-founders Ashish Thusoo and Joydeep Sen Sharma were recently awarded the SIGMOD Software Systems Award for developing a seminal software system—Apache Hive—that brought relational-style declarative programming to the Hadoop ecosystem.A decade back, while at Facebook, we conceived the idea of Apache Hive (Hive), an SQL-like interface for querying data that sits atop Hadoop. Turning ...

YARN FairScheduler Preemption Deep Dive

The multi-part blog post Untangling Apache Hadoop YARN provided an overview of how the YARN scheduler works. In this post we discuss technical details around how FairScheduler Preemption works and best practices to consider when configuring it. We also present a recent overhaul of FairScheduler Preemption in CDH 5.11 which attempts to address a number of ...