- A hive query can be a metadata only request.
- A hive query can be an hdfs get request.
- A hive query can be a Map Reduce job.
Also, we can control number of map/reducer for better performance by rewriting query. For example, writing query in hive like this:
SELECT COUNT(DISTINCT id) ....It will always result in using only one reducer that slow down query. We can rewrite query as sub query to better performance:
-
use this command to set desired number of reducers:
set mapred.reduce.tasks=50 - rewrite query as following:
SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;