Yiyu Jia's technical Blog: January 2014

Monday, January 6, 2014

Clustering algorithms types: partitional clustering and hierarchical clustering

1. Flat or Partitional clustering:
(K-means, Gaussian mixture models, etc.)
Partitions are independent of each other

2. Hierarchical clustering:
(e.g., agglomerative clustering, divisive clustering)
- Partitions can be visualized using a tree structure (a dendr
ogram)
- Does not need the number of clusters as input
- Possible to view partitions at different levels of granularities
(i.e., can refine/coarsen clusters) using different K

K-means variants:
-Hartigan’s k-means algorithm
-Lloyd’s k-means algorithm
-Forgy’s k-means algorithm
-McQueen’s k-means algorithm

a good article about cluster analysis in R.

Saturday, January 4, 2014

Redshift convert integer to Timestamp type and convert epoch time to timestamp on the fly

Currently, Redshift does not support to_timestamp function, which is convenient for converting String or Integer into Timestamp format. Below is a quick notes on how to convert String or integer to be timestamp in Redshift.
Simple, using substring function to get parts for Year, Month, Date, Hour, Minute, Second from String or Integer and concatenate them into Timestamp format and cast to timestamp. Below are examples,
Suppose Table t1 has a column (col1) have integer value in pattern YYMMDDhhmm. For example, it value is 0403051030. It means Year 2014, March fifth, and thirty minutes past ten o'clock. We can use below two methods to convert it to be Timestamp value.

SELECT 
cast(('20' || substring(col1,1, 2) || substring(col1,3, 2) || 
      substring(col1,5, 2) || ' ' || substring(col1,7, 2) || ':' 
      || substring(col1,9, 2)) as timestamp) 
FROM t1;

SELECT 
cast(to_date(col1, 'YYMMDD') || ' ' 
    || substring(col1,7,2) || ':' 
|| substring(col1,9,2) as timestamp) 
FROM t1;

Also, get idea from a blog about how to convert epoch time to timestamp on fly in Redshift.

SELECT  
    (TIMESTAMP 'epoch' + myunixtime * INTERVAL '1 Second ') 
AS   mytimestamp 
FROM  
    example_table