Abstract
SQL queries in the real world are replete with group-by and join operations. This type of queries is often known as GroupBy-Join queries. In some GroupBy-Join queries, it is desirable to perform group-by before join in order to achieve better performance. This subset of GroupBy-Join queries is called GroupBy-Before-Join queries. In this paper, we present a study on the parallelization of GroupBy-Before-Join queries, particularly by exploiting cluster architectures. From our study, we have learned that, in parallel query optimization, processing group-by operations as early as possible is not always desirable. On many occasions, performing data distribution first, before group-by, offers performance advantages. In this study, we also describe our cluster-based scheme.
Original language | English |
---|---|
Title of host publication | Proceedings - 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2001 |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 178-185 |
Number of pages | 8 |
ISBN (Print) | 0769510108, 9780769510101 |
DOIs | |
Publication status | Published - 2001 |
Event | 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2001 - Brisbane, QLD, Australia Duration: 15 May 2001 → 18 May 2001 |
Conference
Conference | 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2001 |
---|---|
Country/Territory | Australia |
City | Brisbane, QLD |
Period | 15/05/01 → 18/05/01 |