Bigalitic: HIVE

WHAT IS HIVE ?

Hive is one of important echo system in hadoop framework,by which , you can process and analyze HDFS files data .Hive is also called data warehouse environment of hadoop framework.

The language used in hive is hql (Hive Query language) which is similar to sql of rdbms.but there are lots of differences between hive and rdbms.

Hive supports only batch process (bulk data processing) , and does not support row level operations such as reading a row randomly (ex: select * from sales where prid='909') , inserting a single row (ex: insert into sales values(......) )etc..hql does not have dml statements to delete and update rows, but by using indirect methods

we can update or delete data of hive tables.hive will run on top hdfs and mapreduce.

Hive storage is HDFS:this means, when you create a table in hive , in hdfs one table directory will be created. If you load any file into hive table, the file will be copied into its backend hdfs directory.

Hive execution model is mapreduce :This means, when you submit hql statement, the hql statement will be converted into MapReduce code, and the converted code will be submitted to jvm. so hadoop can execute the hql statement in MapReduce style.

so , developer/analyst can easily process or analyze the data using hql statements with out writing complex java programs. Especially, hive is good for adhoc reporting or analytics. but sql or hql is not solution for every situation of analytics. Because for your analytics, some custom functionalities are required , which are not available in hive built in functions. These custom functionalities can be developed and written in hive UDFs(User defined functions).

Hive udfs can be developed in following languages:

--> java

--> python

--> c++

--> Ruby

--> R (statistical programming)

These udfs to be registered in hive, and then can be called any number of times.

Author:

Bharat Ram.

halitics.blogspot.in

Bigalitic

Pages

HIVE

No comments:

Post a Comment

About Me