Hive Tables Learn More..
Bigalitic
Wednesday, July 17, 2013
Tuesday, July 16, 2013
What Is Hive?
Hive is one of important echo system in hadoop
framework,
by which , you can process and analyze HDFS files
data .
Hive is also called data warehouse environment of
hadoop framework.The language used in hive is
hql (Hive Query language) which is similar to sql of rdbms.but there are
lots of differences between hive and rdbms.
Hive supports only
batch process (bulk data processing) , and does not support row level operations
such as reading a row randomly
(ex: select * from
sales where prid='909') , inserting a single row
(ex: insert into sales
values(......) )etc..hql does not have dml statements to delete and update
rows, but by using indirect methods
we
can update or delete data of hive tables.hive will run on top hdfs and mapreduce.
hive
storage is HDFS:this means, when you create a table in hive , in hdfs one table
directory will be created.
If you
load any file into hive table, the file will be copied into its backend hdfs directory.
Hive execution model is mapreduce :
This
means, when you submit hql statement, the hql statement will be converted into
MapReduce code, and the converted code will be submitted to jvm. so hadoop can
execute the hql statement in MapReduce style.
so
, developer/analyst can easily process or analyze the data using hql statements
with out writing complex java programs. Especially,
hive is good for adhoc reporting or analytics. but sql or hql is not solution
for every situation of analytics. Because for your analytics, some custom functionalities are required , which
are not available in hive built in functions. These custom functionalities can
be developed and written in hive UDFs(User defined functions).
Hive udfs can be developed in following languages:
--> java
-->
python
--> c++
-->
Ruby
--> R
(statistical programming)
These udfs to be registered in hive, and then can be
called any number of times.
Author:
Bharat Ram.
Thursday, July 11, 2013
Subscribe to:
Posts (Atom)