SOJA: a memory-efficent small-large outer join for MPI

Liang Liang, Guang Yang, Thomas Heinis, David Taniar

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

The join is a fundamental and widely used operation in data analytics but equally, it is also one of the most expensive ones. Considerable work has been carried out to improve and evaluate join approaches based on popular distributed processing systems such as Spark and Hadoop, however, it has not been widely studied on MPI. In this paper, we first implement, analyse and compare existing algorithms for the common small-large outer join operation and develop a novel approach, the swap-based outer join algorithm (SOJA). SOJA is designed to minimise the expensive communication between the distributed nodes while also reducing the cost of the local join operations. We demonstrate the benefits of SOJA experimentally, showing that it achieves at worst an execution time similar to its competitors. More importantly, SOJA requires substantially less memory (typically half the memory compared to the best competitor) and that memory usage scales very well.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2021
Subtitle of host publication24th International Conference on Extending Database Technology, Proceedings
EditorsYannis Velegrakis, Demetris Zeinalipour, Panos K. Chrysanthis, Francesco Guerra
Place of PublicationKonstanz Germany
PublisherOpenProceedings
Pages523-528
Number of pages6
ISBN (Electronic)9783893180844
DOIs
Publication statusPublished - 2021
EventExtending Database Technology 2021 - Online, Nicosia, Cyprus
Duration: 23 Mar 202126 Mar 2021
Conference number: 24th
https://openproceedings.org/html/pages/2021_edbt.html (Proceedings)

Publication series

NameAdvances in Database Technology - EDBT
PublisherOpenProceedings.org, University of Konstanz, University Library
Volume2021-March
ISSN (Electronic)2367-2005

Conference

ConferenceExtending Database Technology 2021
Abbreviated titleEDBT 2021
Country/TerritoryCyprus
CityNicosia
Period23/03/2126/03/21
Internet address

Keywords

  • Algorithm
  • MPI
  • Outer joins
  • Parallel processing

Cite this