PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Hive vs Pig

Pig and Hive are the two key components of the Hadoop ecosystem. But why pig and why hive is always a question which strikes back many hadoop newbies,

Pig hadoop and Hive hadoop have a similar goal- they are tools that ease the complexity of writing complex java MapReduce programs. However, when to use Pig Latin and when to use HiveQL is the question most of the have developers have. Apache HIVE and Apache PIG components of the Hadoop ecosystem are briefed. If we take a look at diagrammatic representation of the Hadoop ecosystem, HIVE and PIG components cover the same verticals and this certainly raises the question, which one is better? It’s Pig vs Hive (Yahoo vs Facebook).




Procedural Data Flow Language Declarative SQLish Language
For Programming For creating reports
Mainly used by Researchers and Programmers Mainly used by Data Analysts
Operates on the client side of a cluster. Operates on the server side of a cluster.
Does not have a dedicated metadata database. Makes use of exact variation of dedicated SQL DDL language by defining tables beforehand.
Pig is SQL like but varies to a great extent. Directly leverages SQL and is easy to learn for database experts.
Pig supports Avro file format. Hive does not support it.
Apache Pig is usually used for semi structured data. Used for Structured Data
Schema is optional. Hive requires a well-defined Schema.
Coding style Verbose Coding style More like SQL


Add a Comment

Your email address will not be published. Required fields are marked *