iDatamining.org

I am looking for projects to work on
Please contact with me at yiyu.jia@BostonInfoPro.com!

Sunday, November 18, 2012

whoami and hadoop always send linux account name as hadoop user name

When we use hadoop eclipse plugin, we find that it always send local linux account name as hadoop account to the hadoop cluster. Who am I? The trick is that hadoop use linux command whoami to get account info. Relevant code could be find in class org.apache.hadoop.util.Shell.java
  /** a Unix command to get the current user's name */
  public final static String USER_NAME_COMMAND = "whoami";
  /** a Unix command to get the current user's groups list */
  public static String[] getGroupsCommand() {
    return new String[]{"bash", "-c", "groups"};
  }
  /** a Unix command to get a given user's groups list */
  public static String[] getGroupsForUserCommand(final String user) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {"bash", "-c", "id -Gn " + user};
  }
  /** a Unix command to get a given netgroup's user list */
  public static String[] getUsersForNetgroupCommand(final String netgroup) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {"bash", "-c", "getent netgroup " + netgroup};
  }
  /** a Unix command to set permission */
  public static final String SET_PERMISSION_COMMAND = "chmod";
  /** a Unix command to set owner */
  public static final String SET_OWNER_COMMAND = "chown";
  public static final String SET_GROUP_COMMAND = "chgrp";
  /** Return a Unix command to get permission information. */
  public static String[] getGET_PERMISSION_COMMAND() {
    //force /bin/ls, except on windows.
    return new String[] {(WINDOWS ? "ls" : "/bin/ls"), "-ld"};
  }


So, a simple way to solve this.
  1. create a shell script file named as iamwho.sh and its content as simple as below,
    echo "hadoopuser"
    
  2. edit .bash_profile
     vi ~/.bash_profile 
  3. add a line as below,
     alias whoami="~/local/bin/iamwho.sh" 
  4. enable the modifition in current shell console,
     source ~/.bash_profile 
  5. run command to check whoami command has been overwrited.
     
    [yiyujia@localhost bin]$ whoami
    hadoopuser
    [yiyujia@localhost bin]$ 
     
http://stackoverflow.com/questions/11041253/set-hadoop-system-user-for-client-embedded-in-java-webapp

Sunday, November 11, 2012

build hadoop eclipse plugin from the source code


1) install Eclipse

 2)build hadoop from the source code.

3) edit build-contrib.xml to enable eclipse plugin building

vi $Hadoop_sr_home/src/contrib/build-contrib.xml

check the version number of built hadoop and add two line in the file. For example

  
  

4) got to diretory $Hadoop_sr_home/src/contrib/eclipse-plugin/

5) run ant command

6) get  hadoop-eclipse-plugin-1.1.3-SNAPSHOT.jar under $Hadoop_sr_home/build/contrib/eclipse-plugin

Thursday, November 8, 2012

List out all Configursation entries in the Hadoop instance.

This is an extremely simple code that will give us a more straight view of what Hadoop Configuration object is. Extremely simple lines of code as below lists out entries of your hadoop instance's Configuraiton object. This tiny code could help me to figure out the configruation of a new Hadoop instance. Probably, I should expand this to be a real HadoopInfo.java that is similar as phpinfo.php.

import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.hadoop.conf.Configuration;


public class HadoopInfo {
 
 public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();
     Iterator< Entry< String, String > > entries = conf.iterator();
     System.out.println("< table border=\"1\" width=\"760\" style=\"word-break:break-all;\" >" +
       "< caption>Hadoop defaul Configruation keys and values< /caption >  " +
       "< tr >< th >Key< /th >< th >Value< /th >< /tr >");
     while(entries.hasNext()){
      Map.Entry< String, String> en = (Map.Entry < String, String >)entries.next();    
      System.out.println("< tr >< td width=\"350\"> " + en.getKey() + " < /td >< td >" + en.getValue() + "< /td >< /tr >");
     }
     System.out.println("< / table >");
 } 
}

Sample output for a fresh standalone Hadoop instance.

Hadoop defaul Configruation keys and values
KeyValue
io.seqfile.compress.blocksize 1000000
hadoop.http.authentication.signature.secret.file ${user.home}/hadoop-http-auth-signature-secret
io.skip.checksum.errors false
fs.checkpoint.size 67108864
hadoop.http.authentication.kerberos.principal HTTP/localhost@LOCALHOST
fs.s3n.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem
fs.s3.maxRetries 4
webinterface.private.actions false
hadoop.http.authentication.simple.anonymous.allowed true
fs.s3.impl org.apache.hadoop.fs.s3.S3FileSystem
hadoop.native.lib true
fs.checkpoint.edits.dir ${fs.checkpoint.dir}
ipc.server.listen.queue.size 128
fs.default.name file:///
hadoop.http.authentication.kerberos.keytab ${user.home}/hadoop.keytab
ipc.client.idlethreshold 4000
hadoop.tmp.dir /tmp/hadoop-${user.name}
fs.hsftp.impl org.apache.hadoop.hdfs.HsftpFileSystem
fs.checkpoint.dir ${hadoop.tmp.dir}/dfs/namesecondary
fs.s3.block.size 67108864
hadoop.security.authorization false
io.serializations org.apache.hadoop.io.serializer.WritableSerialization
hadoop.util.hash.type murmur
io.seqfile.lazydecompress true
io.file.buffer.size 4096
io.mapfile.bloom.size 1048576
fs.s3.buffer.dir ${hadoop.tmp.dir}/s3
hadoop.logfile.size 10000000
fs.webhdfs.impl org.apache.hadoop.hdfs.web.WebHdfsFileSystem
ipc.client.kill.max 10
io.compression.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec
topology.script.number.args 100
fs.har.impl org.apache.hadoop.fs.HarFileSystem
io.seqfile.sorter.recordlimit 1000000
fs.trash.interval 0
hadoop.security.authentication simple
local.cache.size 10737418240
hadoop.security.group.mapping org.apache.hadoop.security.ShellBasedUnixGroupsMapping
ipc.server.tcpnodelay false
hadoop.security.token.service.use_ip true
fs.ramfs.impl org.apache.hadoop.fs.InMemoryFileSystem
ipc.client.connect.max.retries 10
hadoop.rpc.socket.factory.class.default org.apache.hadoop.net.StandardSocketFactory
fs.kfs.impl org.apache.hadoop.fs.kfs.KosmosFileSystem
fs.checkpoint.period 3600
topology.node.switch.mapping.impl org.apache.hadoop.net.ScriptBasedMapping
hadoop.http.authentication.token.validity 36000
hadoop.security.use-weak-http-crypto false
hadoop.logfile.count 10
hadoop.security.uid.cache.secs 14400
fs.ftp.impl org.apache.hadoop.fs.ftp.FTPFileSystem
fs.file.impl org.apache.hadoop.fs.LocalFileSystem
fs.hdfs.impl org.apache.hadoop.hdfs.DistributedFileSystem
ipc.client.connection.maxidletime 10000
io.mapfile.bloom.error.rate 0.005
io.bytes.per.checksum 512
fs.har.impl.disable.cache true
ipc.client.tcpnodelay false
fs.hftp.impl org.apache.hadoop.hdfs.HftpFileSystem
hadoop.relaxed.worker.version.check false
fs.s3.sleepTimeSeconds 10
hadoop.http.authentication.type simple