WHAT IS HIVE ?
Hive is one of important
echo system in hadoop framework,by which , you
can process and analyze HDFS files data .Hive is also called data warehouse
environment of hadoop framework.
The
language used in hive is hql (Hive Query language) which is similar to
sql of rdbms.but there are lots of differences between hive and rdbms.
Hive
supports only batch process (bulk data processing) , and does not support row
level operations such as reading a row randomly (ex: select * from sales where
prid='909') , inserting a single row (ex: insert into sales values(......)
)etc..hql does not have dml statements to delete and update rows, but by using
indirect methods
we can update or delete
data of hive tables.hive will run on top hdfs and mapreduce.
Hive storage is HDFS:this means, when you create a table in hive , in hdfs one table
directory will be created. If you load any file into hive table, the file
will be copied into its backend hdfs directory.
Hive execution model is mapreduce :This means, when you submit hql statement, the hql statement will be converted into MapReduce code, and the converted code will be submitted to jvm. so hadoop can execute the hql statement in MapReduce style.
so , developer/analyst can easily process or analyze the data using hql
statements with out writing complex java programs. Especially, hive is
good for adhoc reporting or analytics. but sql or hql is not solution for every
situation of analytics. Because for your analytics, some custom
functionalities are required , which are not available in hive built in
functions. These custom functionalities can be developed and written in hive
UDFs(User defined functions).
Hive udfs can be developed in following languages:
-->
java
-->
python
-->
c++
-->
Ruby
--> R
(statistical programming)
These udfs to be
registered in hive, and then can be called any number of times.
Author:
Bharat Ram.
No comments:
Post a Comment