iDatamining.org

I am looking for projects to work on
Please contact with me at yiyu.jia@BostonInfoPro.com!

Tuesday, April 3, 2012

load flat file into Hive table step by step.

This is a very simple example to show how load a local flat file into hive table.
  1. prepare a local table file named sample.csv having content as below.
    35, Amy 
    17, Ben 
    5,Chris  
    10,Don 
    
  2. create table. -- creating a managed table that "knows" how the flat file are formated.
    hive> use testdb;
    hive> create table test_hive(age int, name string) row format delimited fields terminated by ',';
    

    Note: explicitly specifying the delimited character like "row format delimited fields terminated by ','" is the key part here as the default delimited character identified by hive is ^A (ASCII code 1).
  3. Loading file. -- load file from local file system into Hive.
    hive>use testDB;
    hive> load data local inpath '/home/hadoopuser/hiveTest/sample.csv' overwrite into table test_hive;
    

    Note: this example is about loading file from linux local file system into Hive that is on HDFS file system. To load file from HDFS file system into Hive, we probably create an external table for it.
  4. Check data. -- select from table.
    hive> select * from test_hive;                                                                       
    OK
    35  Amy 
    17  Ben 
    5 Chris  
    10 Don 
    

    Note: You can see that "Amy" and "Ben" are not aligned with rest two rows. It is because hive takes all white spaces at the front of string and behind string as part of column values. So, we need to trim white spaces before and behind strings if white spaces are not expected in hive table.
    To determine it. Run below queries to see different results.
    select * from test_hive where name='Amy'; 
    select * from test_hive where name=' Amy'; 
    select * from test_hive where name=' Amy ';
    


The above example is an extremely simple one. To check all possible options when a table is created, please check Hive DDL document from Apache and Cloudera.

No comments:

Post a Comment