Wednesday, April 18, 2012

Steps to setup Hadoop Pig on Hadoop cluster environment

  1. wget http://www.linuxtourist.com/apache/pig/stable/pig-x.y.z.tar.gz
  2.  cd /home/hadoopuser/app/
  3. mv ~/Download/pig-x.y.z.tar.gz ./
  4.  tar -xvf pig-x.y.z.tar.gz
  5.  ln -s /home/hadoopuser/app/pig-x.y.z /pig
  6. Edit ~/.bash_profile to add PIG_HOME and add its bin into PATH.
    #java home
    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_17-sun
    
    #hadoop home
    export HADOOP_HOME=/hadoop
    
    #hive home
    export HIVE_HOME=/hive
    
    #pig home
    export PIG_HOME=/pig
    
    PATH=$PATH:$HOME/bin:$HIVE_HOME/bin:$PIG_HOME/bin
    
    export PATH
    
    
  7. Run the pig command and check if it works.
    2013-01-04 03:14:30,310 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://cent63VM01:9000
    2013-01-04 03:14:30,586 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: cent63VM01:9001
    grunt> 
    
  8. Create a directory on HDFS
    [hadoopuser@cent63VM01 pigTest]$ /hadoop/bin/hadoop fs -mkdir pig
    
  9. Upload file on HDFS
    [hadoopuser@cent63VM01 pigTest]$ /hadoop/bin/hadoop fs -put /home/hadoopuser/pigTest/passwd /user/hadoopuser/pig
    
  10. Run a extremely simple Pig example.
    grunt> A = load '/user/hadoopuser/pig/passwd' using PigStorage(':');
    grunt> B = foreach A generate $0 as id;
    grunt> dump B;
    

No comments:

Post a Comment