Abstract
The join is a fundamental and widely used operation in data analytics but equally, it is also one of the most expensive ones. Considerable work has been carried out to improve and evaluate join approaches based on popular distributed processing systems such as Spark and Hadoop, however, it has not been widely studied on MPI. In this paper, we first implement, analyse and compare existing algorithms for the common small-large outer join operation and develop a novel approach, the swap-based outer join algorithm (SOJA). SOJA is designed to minimise the expensive communication between the distributed nodes while also reducing the cost of the local join operations. We demonstrate the benefits of SOJA experimentally, showing that it achieves at worst an execution time similar to its competitors. More importantly, SOJA requires substantially less memory (typically half the memory compared to the best competitor) and that memory usage scales very well.
| Original language | English |
|---|---|
| Title of host publication | Advances in Database Technology - EDBT 2021 |
| Subtitle of host publication | 24th International Conference on Extending Database Technology, Proceedings |
| Editors | Yannis Velegrakis, Demetris Zeinalipour, Panos K. Chrysanthis, Francesco Guerra |
| Place of Publication | Konstanz Germany |
| Publisher | OpenProceedings |
| Pages | 523-528 |
| Number of pages | 6 |
| ISBN (Electronic) | 9783893180844 |
| DOIs | |
| Publication status | Published - 2021 |
| Event | Extending Database Technology 2021 - Online, Nicosia, Cyprus Duration: 23 Mar 2021 → 26 Mar 2021 Conference number: 24th https://openproceedings.org/html/pages/2021_edbt.html (Proceedings) |
Publication series
| Name | Advances in Database Technology - EDBT |
|---|---|
| Publisher | OpenProceedings.org, University of Konstanz, University Library |
| Volume | 2021-March |
| ISSN (Electronic) | 2367-2005 |
Conference
| Conference | Extending Database Technology 2021 |
|---|---|
| Abbreviated title | EDBT 2021 |
| Country/Territory | Cyprus |
| City | Nicosia |
| Period | 23/03/21 → 26/03/21 |
| Internet address |
|
Keywords
- Algorithm
- MPI
- Outer joins
- Parallel processing