IoT-enabled directed acyclic graph in spark cluster

Jahwan Koo, Nawab Muhammad Faseeh Qureshi, Isma Farah Siddiqui, Asad Abbas, Ali Kashif Bashir

Research output: Contribution to journalArticleResearchpeer-review

11 Citations (Scopus)

Abstract

Real-time data streaming fetches live sensory segments of the dataset in the heterogeneous distributed computing environment. This process assembles data chunks at a rapid encapsulation rate through a streaming technique that bundles sensor segments into multiple micro-batches and extracts into a repository, respectively. Recently, the acquisition process is enhanced with an additional feature of exchanging IoT devices’ dataset comprised of two components: (i) sensory data and (ii) metadata. The body of sensory data includes record information, and the metadata part consists of logs, heterogeneous events, and routing path tables to transmit micro-batch streams into the repository. Real-time acquisition procedure uses the Directed Acyclic Graph (DAG) to extract live query outcomes from in-place micro-batches through MapReduce stages and returns a result set. However, few bottlenecks affect the performance during the execution process, such as (i) homogeneous micro-batches formation only, (ii) complexity of dataset diversification, (iii) heterogeneous data tuples processing, and (iv) linear DAG workflow only. As a result, it produces huge processing latency and the additional cost of extracting event-enabled IoT datasets. Thus, the Spark cluster that processes Resilient Distributed Dataset (RDD) in a fast-pace using Random access memory (RAM) defies expected robustness in processing IoT streams in the distributed computing environment. This paper presents an IoT-enabled Directed Acyclic Graph (I-DAG) technique that labels micro-batches at the stage of building a stream event and arranges stream elements with event labels. In the next step, heterogeneous stream events are processed through the I-DAG workflow, which has non-linear DAG operation for extracting queries’ results in a Spark cluster. The performance evaluation shows that I-DAG resolves homogeneous IoT-enabled stream event issues and provides an effective stream event heterogeneous solution for IoT-enabled datasets in spark clusters.

Original languageEnglish
Article number50
Number of pages15
JournalJournal of Cloud Computing
Volume9
Issue number1
DOIs
Publication statusPublished - 14 Sept 2020
Externally publishedYes

Keywords

  • Apache spark
  • Directed acyclic graph
  • Internet of Things (IoT)
  • MapReduce
  • Micro-batch stream

Cite this