iDatamining.org

I am looking for projects to work on
Please contact with me at yiyu.jia@iDataMining.org!

Tuesday, June 10, 2014

set up Grail on Groovy environment on CentOS


yiyujia@hadoopDev:~ $ curl -s get.gvmtool.net | bash
yiyujia@hadoopDev:~ $ source "/root/.gvm/bin/gvm-init.sh"
yiyujia@hadoopDev:~ $ gvm help
yiyujia@hadoopDev:~ $ gvm install groovy
yiyujia@hadoopDev:~ $ gvm install grails
yiyujia@hadoopDev:~ $ gvm install grails 2.2.4
yiyujia@hadoopDev:~ $ grails
| Enter a script name to run. Use TAB for completion: 
grails> quit
| Goodbye
yiyujia@hadoopDev:~ $ 
https://github.com/Netflix/asgard/blob/master/application.properties #Grails Metadata file #Wed Jun 05 11:45:00 PST 2014 app.grails.version=2.2.4 app.name=asgard app.servlet.version=2.4 app.version=1.5

Monday, June 2, 2014

Tuning redshift query performance

We can tune Redshift query by the following four steps literately

1. Identify slow queries

select trim(t3.usename) AS username , t1.query AS query_ID, t1.starttime, t1.elapsed, t2.querytxt from svl_qlog t1 join STL_QUERY t2 ON t1.query = t2.query JOIN pg_user t3 ON t1.userid = t3.usesysid  order by t1.elapsed desc limit 50;

2. Table Design

select correct sort key, distribution key, and column compression algorithms (see blog post "Redshift table optimizing with SORTKEY and DISTKEY.")

3. Query Design



4. WLM Design


http://docs.aws.amazon.com/redshift/latest/dg/t_determine_if_query_runs_on_disk.html http://docs.aws.amazon.com/redshift/latest/dg/r_SVL_QLOG.html http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html select query, elapsed, substring from svl_qlog order by elapsed desc limit 5; select query, step, rows, workmem, label, is_diskbased from svl_query_summary where query = 4839858 order by workmem desc; select owner as node, diskno, used, capacity from stv_partitions order by 1, 2, 3, 4; select owner, host, diskno, used, capacity, (used-tossed)/capacity::numeric *100 as pctused from stv_partitions order by owner; SELECT userid, xid, pid, query, segment, locus, datediff(ms, starttime, endtime) as duration, compile from svl_compile where query = 35878 or query = 35879 order by query, segment; Redshift Tips http://oksoft.blogspot.com/2013/03/redshift-tips.html http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices