PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Latest

Impala CookBook

impala

Impala is an open source massively parallel processing query engine on top of clustered systems like Apache Hadoop. It was created based on Google’s Dremel paper.  It is an interactive SQL like query engine that runs on top of Hadoop Distributed File System (HDFS). Impala uses HDFS as its underlying storage. 

How to Make an Installation File

If you have an .exe file (or really any file) you've made or not, and you want to make an installation to it. The process is easy and quick, just the tutorial is very detailed. This works on Windows Only. 1) Go to your "Run" menu and write iexpress.exe. The run menu can be found in the ...

Installing Hadoop on Windows

Apache Hadoop is an open-source framework which is used for distributed processing ,performing computations of large data sets on clusters by distributing computations to each of the node.This framework mainly comes with a hadoop kernel , ability to run distributed MapReduce jobs and a filesystem-HDFS. There are many tutorials which help you install hadoop on windows but most ...

Building a Single Node Hadoop cluster on UBUNTU

Big Data is a big world and so is its use , Considering its complexity and educate everyone on its simplicity this blog post gives a step by step process on installing & Building a Single node hadoop cluster which can be used for Learning and Operational purpose. The post guides a idle professional with ...

Important Hive Command & HQL examples

What is HQL? Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analysis. Uses of Hive: 1. The Apache Hive distributed storage. 2. Hive ...

10 Best Practices for Apache Hive

Apache Hive is an SQL-like software used with Hadoop to give users the capability of performing SQL-like queries on it’s own language, HiveQL, quickly and efficiently . It also gives users additional query and analytical abilities not available on traditional SQL structures. With Apache Hive, users can use HiveQL or traditional Mapreduce systems, depending on individual ...

Hadoop SQL Engines

Pre-requisites for Hadoop to make inroads into the Enterprise Data Warehouse space is to have the following three items in place: Subsecond response times for SQL queries (often refered to as interactive or real time queries). Performance similar to existing MPP RDBMS such as Teradata. Support for a rich SQL feature set Support for Update ...