I am looking for projects to work on
Please contact with me at!

Tuesday, June 10, 2014

set up Grail on Groovy environment on CentOS

yiyujia@hadoopDev:~ $ curl -s | bash
yiyujia@hadoopDev:~ $ source "/root/.gvm/bin/"
yiyujia@hadoopDev:~ $ gvm help
yiyujia@hadoopDev:~ $ gvm install groovy
yiyujia@hadoopDev:~ $ gvm install grails
yiyujia@hadoopDev:~ $ gvm install grails 2.2.4
yiyujia@hadoopDev:~ $ grails
| Enter a script name to run. Use TAB for completion: 
grails> quit
| Goodbye
yiyujia@hadoopDev:~ $ #Grails Metadata file #Wed Jun 05 11:45:00 PST 2014 app.grails.version=2.2.4 app.servlet.version=2.4 app.version=1.5

Monday, June 2, 2014

Tuning redshift query performance

We can tune Redshift query by the following four steps literately

1. Identify slow queries

select trim(t3.usename) AS username , t1.query AS query_ID, t1.starttime, t1.elapsed, t2.querytxt from svl_qlog t1 join STL_QUERY t2 ON t1.query = t2.query JOIN pg_user t3 ON t1.userid = t3.usesysid  order by t1.elapsed desc limit 50;

2. Table Design

select correct sort key, distribution key, and column compression algorithms (see blog post "Redshift table optimizing with SORTKEY and DISTKEY.")

3. Query Design

4. WLM Design select query, elapsed, substring from svl_qlog order by elapsed desc limit 5; select query, step, rows, workmem, label, is_diskbased from svl_query_summary where query = 4839858 order by workmem desc; select owner as node, diskno, used, capacity from stv_partitions order by 1, 2, 3, 4; select owner, host, diskno, used, capacity, (used-tossed)/capacity::numeric *100 as pctused from stv_partitions order by owner; SELECT userid, xid, pid, query, segment, locus, datediff(ms, starttime, endtime) as duration, compile from svl_compile where query = 35878 or query = 35879 order by query, segment; Redshift Tips