Yiyu Jia's technical Blog: HIVE Select count performance

Sunday, January 27, 2013

HIVE Select count performance

I read a discussion about performance of Hive Select, that explains why Select count(*) slower than slect *. I note it here as one of hints for tuning hive query.

A hive query can be a metadata only request.
A hive query can be an hdfs get request.
A hive query can be a Map Reduce job.

Also, we can control number of map/reducer for better performance by rewriting query. For example, writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

It will always result in using only one reducer that slow down query. We can rewrite query as sub query to better performance:

use this command to set desired number of reducers:
set mapred.reduce.tasks=50
rewrite query as following:
SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)