In-database connected component analysis

Harald Bogeholz, Michael Brand, Radu Alexandru Todor

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)


We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any ϵ>0 and any input graph G = (V, E) the algorithm terminates after O(log|V |) SQL queries with probability of at least, which we show empirically to translate to a quasi-linear runtime in practice.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
EditorsShu-Ching Chen, Karuna Joshi
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages12
ISBN (Electronic)9781728129037
ISBN (Print)9781728129044
Publication statusPublished - 2020
EventIEEE International Conference on Data Engineering 2020 - Online, Dallas, United States of America
Duration: 20 Apr 202024 Apr 2020
Conference number: 36th (Proceedings) (Website)

Publication series

NameProceedings - International Conference on Data Engineering
PublisherThe Institute of Electrical and Electronics Engineers, Inc.
ISSN (Print)1084-4627
ISSN (Electronic)2375-026X


ConferenceIEEE International Conference on Data Engineering 2020
Abbreviated titleICDE 2020
Country/TerritoryUnited States of America
Internet address


  • Big Data
  • Blockchain
  • Data science
  • Distributed algorithms
  • Distributed databases
  • Graph theory
  • Relational databases
  • SQL

Cite this