Understanding the sequence of executing sequence of clauses in Hive is very helpful for optimizing query. It will be idea if we can make every inter steps to generate data set as small as possible. The order of executing part of a query:
- FROM & JOINs determine & filter rows
- WHERE more filters on the rows
- GROUP BY combines those rows into groups
- HAVING filters groups
- ORDER BY arranges the remaining rows/groups
No comments:
Post a Comment