Sunday, November 18, 2012

whoami and hadoop always send linux account name as hadoop user name

When we use hadoop eclipse plugin, we find that it always send local linux account name as hadoop account to the hadoop cluster. Who am I? The trick is that hadoop use linux command whoami to get account info. Relevant code could be find in class
  /** a Unix command to get the current user's name */
  public final static String USER_NAME_COMMAND = "whoami";
  /** a Unix command to get the current user's groups list */
  public static String[] getGroupsCommand() {
    return new String[]{"bash", "-c", "groups"};
  /** a Unix command to get a given user's groups list */
  public static String[] getGroupsForUserCommand(final String user) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {"bash", "-c", "id -Gn " + user};
  /** a Unix command to get a given netgroup's user list */
  public static String[] getUsersForNetgroupCommand(final String netgroup) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {"bash", "-c", "getent netgroup " + netgroup};
  /** a Unix command to set permission */
  public static final String SET_PERMISSION_COMMAND = "chmod";
  /** a Unix command to set owner */
  public static final String SET_OWNER_COMMAND = "chown";
  public static final String SET_GROUP_COMMAND = "chgrp";
  /** Return a Unix command to get permission information. */
  public static String[] getGET_PERMISSION_COMMAND() {
    //force /bin/ls, except on windows.
    return new String[] {(WINDOWS ? "ls" : "/bin/ls"), "-ld"};

So, a simple way to solve this.
  1. create a shell script file named as and its content as simple as below,
    echo "hadoopuser"
  2. edit .bash_profile
     vi ~/.bash_profile 
  3. add a line as below,
     alias whoami="~/local/bin/" 
  4. enable the modifition in current shell console,
     source ~/.bash_profile 
  5. run command to check whoami command has been overwrited.
    [yiyujia@localhost bin]$ whoami
    [yiyujia@localhost bin]$

Sunday, November 11, 2012

build hadoop eclipse plugin from the source code

1) install Eclipse

 2)build hadoop from the source code.

3) edit build-contrib.xml to enable eclipse plugin building

vi $Hadoop_sr_home/src/contrib/build-contrib.xml

check the version number of built hadoop and add two line in the file. For example


4) got to diretory $Hadoop_sr_home/src/contrib/eclipse-plugin/

5) run ant command

6) get  hadoop-eclipse-plugin-1.1.3-SNAPSHOT.jar under $Hadoop_sr_home/build/contrib/eclipse-plugin

Thursday, November 8, 2012

List out all Configursation entries in the Hadoop instance.

This is an extremely simple code that will give us a more straight view of what Hadoop Configuration object is. Extremely simple lines of code as below lists out entries of your hadoop instance's Configuraiton object. This tiny code could help me to figure out the configruation of a new Hadoop instance. Probably, I should expand this to be a real that is similar as phpinfo.php.
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.hadoop.conf.Configuration;

public class HadoopInfo {
 public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();
     Iterator< Entry< String, String > > entries = conf.iterator();
     System.out.println("< table border=\"1\" width=\"760\" style=\"word-break:break-all;\" >" +
       "< caption>Hadoop defaul Configruation keys and values< /caption >  " +
       "< tr >< th >Key< /th >< th >Value< /th >< /tr >");
      Map.Entry< String, String> en = (Map.Entry < String, String >);    
      System.out.println("< tr >< td width=\"350\"> " + en.getKey() + " < /td >< td >" + en.getValue() + "< /td >< /tr >");
     System.out.println("< / table >");

Sample output for a fresh standalone Hadoop instance.

Hadoop defaul Configruation keys and values
io.seqfile.compress.blocksize 1000000
hadoop.http.authentication.signature.secret.file ${user.home}/hadoop-http-auth-signature-secret
io.skip.checksum.errors false
fs.checkpoint.size 67108864
hadoop.http.authentication.kerberos.principal HTTP/localhost@LOCALHOST
fs.s3n.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem
fs.s3.maxRetries 4
webinterface.private.actions false
hadoop.http.authentication.simple.anonymous.allowed true
fs.s3.impl org.apache.hadoop.fs.s3.S3FileSystem
hadoop.native.lib true
fs.checkpoint.edits.dir ${fs.checkpoint.dir}
ipc.server.listen.queue.size 128 file:///
hadoop.http.authentication.kerberos.keytab ${user.home}/hadoop.keytab
ipc.client.idlethreshold 4000
hadoop.tmp.dir /tmp/hadoop-${}
fs.hsftp.impl org.apache.hadoop.hdfs.HsftpFileSystem
fs.checkpoint.dir ${hadoop.tmp.dir}/dfs/namesecondary
fs.s3.block.size 67108864 false
hadoop.util.hash.type murmur
io.seqfile.lazydecompress true
io.file.buffer.size 4096
io.mapfile.bloom.size 1048576
fs.s3.buffer.dir ${hadoop.tmp.dir}/s3
hadoop.logfile.size 10000000
fs.webhdfs.impl org.apache.hadoop.hdfs.web.WebHdfsFileSystem
ipc.client.kill.max 10
topology.script.number.args 100
fs.har.impl org.apache.hadoop.fs.HarFileSystem
io.seqfile.sorter.recordlimit 1000000
fs.trash.interval 0 simple
local.cache.size 10737418240
ipc.server.tcpnodelay false true
fs.ramfs.impl org.apache.hadoop.fs.InMemoryFileSystem
ipc.client.connect.max.retries 10
fs.kfs.impl org.apache.hadoop.fs.kfs.KosmosFileSystem
fs.checkpoint.period 3600
hadoop.http.authentication.token.validity 36000 false
hadoop.logfile.count 10 14400
fs.ftp.impl org.apache.hadoop.fs.ftp.FTPFileSystem
fs.file.impl org.apache.hadoop.fs.LocalFileSystem
fs.hdfs.impl org.apache.hadoop.hdfs.DistributedFileSystem
ipc.client.connection.maxidletime 10000
io.mapfile.bloom.error.rate 0.005
io.bytes.per.checksum 512
fs.har.impl.disable.cache true
ipc.client.tcpnodelay false
fs.hftp.impl org.apache.hadoop.hdfs.HftpFileSystem
hadoop.relaxed.worker.version.check false
fs.s3.sleepTimeSeconds 10
hadoop.http.authentication.type simple