PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Latest

Accelerating Big Data Analytics in the Cloud – Now!

Hortonworks is a true hybrid data company. For almost a decade, the company has been helping customers build a hybrid data warehouse by leveraging the cost-efficiency, storage scalability, and extensive computing with Hadoop to optimize their enterprise data warehouses (EDW). With Hortonworks Data Platform (HDP) for data-at-rest, Hortonworks DataFlow (HDF) for data-in-motion, and Hortonworks DataPlane ...

This Big Data Business Strategy Is Your Formula for Success

Your company likely has a data strategy, a cloud strategy, and a general business strategy. You might view these as separate, but in the most successful businesses, all three are aligned. By embracing an integrated and comprehensive approach—a cloud and big data business strategy—you can set your company up for success. Data Strategy Data is the lifeblood ...

Confluent Hub: A Central Repo For Kafka Connect

Tim Berglund announces Confluent Hub:Connect has been an integral part of Apache Kafka since version 0.9, released late 2015. It has proved to be an effective framework for streaming data in and out of Kafka from nearby systems like relational databases, Amazon S3, HDFS clusters, and even nonstandard legacy systems that typically show themselves in ...

Tuning Spark Jobs Running On YARN

Anushree Subramaniam gives us a primer on Apache YARN, the resource manager which drives Hadoop: In Hadoop version 1.0 which is also referred to as MRV1(MapReduce Version 1), MapReduce performed both processing and resource management functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocated the resources, performed scheduling ...

YARN Fundamentals

Anushree Subramaniam gives us a primer on Apache YARN, the resource manager which drives Hadoop:In Hadoop version 1.0 which is also referred to as MRV1(MapReduce Version 1), MapReduce performed both processing and resource management functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocated the resources, performed scheduling and ...

What happens if a HDFS block is deleted directly from dataNode ?

Lately I wondered what happens if I login into one of the data nodes and delete a HDFS block directly from the filesystem, not via hdfs interface ? If we have a replica factor of 3, then 2 other copies of this block are still available so Hadoop can: Keep serving this block to requestors. Recover the missing ...

Honored to Receive the SIGMOD Systems Award for Apache Hive

Qubole co-founders Ashish Thusoo and Joydeep Sen Sharma were recently awarded the SIGMOD Software Systems Award for developing a seminal software system—Apache Hive—that brought relational-style declarative programming to the Hadoop ecosystem.A decade back, while at Facebook, we conceived the idea of Apache Hive (Hive), an SQL-like interface for querying data that sits atop Hadoop. Turning ...