iDatamining.org

I am looking for projects to work on
Please contact with me at yiyu.jia@iDataMining.org!

Thursday, September 19, 2013

time series analysis using R



  1. download and install MRO from https://mran.revolutionanalytics.com/download/ 
  2. install.packages("TTR")
  3. library("TTR")
  4. usindices $lt;- read.table("http://www2.stat.duke.edu/~mw/data-sets/ts_data/industrial_production", skip=20, header = F)
  5. colnames(usindices) $lt;- c("YR","MN", "IP", "MFG", "MFGD", "MFGN", "MIN", "UTIL", "P", "MAT")
  6. head(usindices)
  7. install.packages("dplyr") 
  8. library(dplyr)
  9. avg1 = summarise(group_by(usindices,YR),avg = mean(MFG))
    
  10. plot(avg1)
  11. install.packages("reshape")
  12. library("reshape")
  13. 
    
tutorial: http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html

Thursday, September 12, 2013

result of cloudera Inspect hosts for my experiement env in basement




Cluster Installation

Inspect hosts for correctness

Validations

Inspector failed on the following hosts...
Individual hosts resolved their own hostnames correctly.
No errors were found while looking for conflicting init scripts.
No errors were found while checking /etc/hosts.
All hosts resolved localhost to 127.0.0.1.
All hosts checked resolved each other's hostnames correctly.
Host clocks are approximately in sync (within ten minutes).
Host time zones are consistent across the cluster.
No users or groups are missing.
No kernel versions that are known to be bad are running.
No performance concerns with Transparent Huge Pages settings.
0 hosts are running CDH3 and 4 hosts are running CDH4.
All checked hosts are running the same version of components.
All managed hosts have consistent versions of Java.
All checked Cloudera Management Daemons versions are consistent with the server.
All checked Cloudera Management Agents versions are consistent with the server.

Version Summary

Group 1 (CDH4)
Hosts
cent63VM01, cent63VM02, cent63VM03, cent63VM04
Component Version CDH Version
Impala 1.1.1 Not applicable
Lily HBase Indexer (CDH4 only) 1.2+2 Not applicable
Solr (CDH4 only) 4.4.0+69 Not applicable
Flume NG 1.4.0+23 CDH4
MapReduce 1 (CDH4 only) 2.0.0+1475 CDH4
HDFS (CDH4 only) 2.0.0+1475 CDH4
HttpFS (CDH4 only) 2.0.0+1475 CDH4
MapReduce 2 (CDH4 only) 2.0.0+1475 CDH4
Yarn (CDH4 only) 2.0.0+1475 CDH4
Hadoop 2.0.0+1475 CDH4
HBase 0.94.6+132 CDH4
HCatalog (CDH4 only) 0.5.0+13 CDH4
Hive 0.10.0+198 CDH4
Mahout 0.7+21 CDH4
Oozie 3.3.2+92 CDH4
Pig 0.11.0+33 CDH4
Sqoop 1.4.3+62 CDH4
Sqoop2 (CDH4 only) 1.99.2+85 CDH4
Whirr 0.8.2+15 CDH4
Zookeeper 3.4.5+23 CDH4
Hue 2.5.0+139 CDH4
Java java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Not applicable
Cloudera Manager Agent 4.7.1 Not applicable

Wednesday, September 11, 2013

Collection of lecture on Markov Chains & Hidden Markov Models

The first set of lectures introduce markov model, markov chains, and extend the markov chain to HMMs

(ML 14.1) Markov models - motivating examples
http://www.youtube.com/watch?v=7KGdE2AK_MQ

(ML 14.2) Markov chains (discrete-time) (part 1)
http://www.youtube.com/watch?v=WUjt98HcHlk

(ML 14.3) Markov chains (discrete-time) (part 2)
http://www.youtube.com/watch?v=j6OUj9tleVM

(ML 14.4) Hidden Markov models (HMMs) (part 1)  
http://www.youtube.com/watch?v=TPRoLreU9lA

===========================================
The following lecture is a more detailed, intuitive explaination about HMMs

Very intuitive
http://www.youtube.com/watch?v=jY2E6ExLxaw

=============================================
 HMM & stock prediction
http://www.slideshare.net/ChiuYW/hidden-markov-model-stock-prediction




  • TOOL KIT
  • R Package– HMM– RHMM
    JAVA– JHMM
    Python– Scikit Learn
  • DEMO
  • GET DATASET
  • library(quantmod)
    getSymbols("^TWII")
    chartSeries(TWII)
    TWII_Subset<- p="" start="as.Date(" window="">TWII_Train <- -="" cbind="" lose="" olume="" p="" pen="" twii_subset="" ubset="">



  • BUILD HMM MODEL
  • # Include RHMM Library
    library(RHmm)

    # Baum-Welch Algorithm
    hm_model <- hmmfit="" nstates="5)</p" obs="TWII_Train">
    # Viterbi Algorithm
    VitPath <- hm_model="" p="" twii_train="" viterbi="">


  • SCATTER PLOT
  • TWII_Predict <- cbind="" lose="" p="" states="" ubset="" vitpath=""> chartSeries(TWII_Predict[,1])
    addTA(TWII_Predict[TWII_Predict[,2]==1,1],on=1,type="p",col=5,pch=25)
    addTA(TWII_Predict[TWII_Predict[,2]==2,1],on=1,type="p",col=6,pch=24)
    addTA(TWII_Predict[TWII_Predict[,2]==3,1],on=1,type="p",col=7,pch=23)
    addTA(TWII_Predict[TWII_Predict[,2]==4,1],on=1,type="p",col=8,pch=22)
    addTA(TWII_Predict[TWII_Predict[,2]==5,1],on=1,type="p",col=10,pch=21)


    Sunday, September 8, 2013

    HBase install and performance test

    1. download hbase.
      [hadoopuser@cent63VM01 app]$ wget http://apache.osuosl.org/hbase/stable/hbase-0.94.2.tar.gz
      
    2. untar the file
      [hadoopuser@cent63VM01 app]$ tar xvf hbase-0.94.2.tar.gz
      
    3. create a soft link for HBase.
      [hadoopuser@cent63VM01 app]$ ln -s /hadoop/hbase-0.94.2/ /hbase
      
    4. copy sample configuration file.
      [hadoopuser@cent63VM01 app]$ cp /hbase/sr/resources/hbase-default.xml /hbase/conf/hbase-site.xml
      
    5. Edit hbase-site.xml as below.
      sfds
      
    6. copy configuration files to all nodes..
      sfdsrsync -avz ./hbase-0.94.2 hadoopuser@cent63V4.corp.ybusa.net:/hadoop/
      
    7. Zookeeper's port numbers are troublesome. To be simple, I disable the firewall.
      service iptables status
      service save iptables
      service stop iptables
      chkconfig iptables off
      

    system-config-firewall open HBase REST port 8080; open port 60000 and 60010 for master. for eegional server open port 60020 and port 60030; for zookeeper, open port 2888, 3888, 2181

    check file /etc/hosts


    rm -Rf /tmp/hadoop-username  clean data fo

    service iptables status
    service save iptables
    service stop iptables
    chkconfig iptables off

    HBase performance testing wity ycsb 0.1.4 http://johnjianfang.blogspot.com/2012/09/hbase-performance-testing-wity-ycsb-014.html

    Hbase 错误记录及修改方法
    http://blog.csdn.net/kntao/article/details/7642547

    yum install java-1.6.0-openjdk java-1.6.0-openjdk-devel