A frequency-aware grouping strategy for stateful operators in distributed stream processing systems

Dawei Sun, Zhe Chen, Weilong Lv, Shang Gao, Jia Rong

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Current optimization for stateful flow processing computation tends to focus on load balancing without considering the utilization of downstream instance resources. To address this issue, we propose a data stream grouping method called Fa-Stream, specifically designed for stateful operators and incorporating field values frequency-awareness. Fa-Stream is implemented in three main aspects: (1) A data stream grouping model is built using Count-Min Sketch and Gated Recurrent Unit (GRU) to predict and analyze the frequency of field values. It selectively chooses high-frequency field values, and the communication distance model and instance resource constraint model are designed to adjust the weights of downstream instances for high-frequency field values. (2) A cyclic access routing table is generated, and weights are dynamically adjusted by a rebalancing scheme to avoid load skewness. Consistent hash grouping is implemented for low-frequency field values, and dual mapping is used to prevent large-scale migration caused by scaling. To validate the effectiveness of Fa-Stream, comparative experiments between Partial Key Grouping (PKG) and Fa-Stream are conducted using the Storm platform. Results demonstrate that Fa-Stream improves tuple throughput by 12.3%, reduces system delay by 14.2%, and increases load balancing degree by 42.8%. Furthermore, fa-Stream exhibits efficiency and stability across different data skews and tuple input rates.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 29th International Conference on Parallel and Distributed Systems, ICPADS 2023
EditorsJieming Yang, Shaojun Zou, Zhicai Zhang
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages126-131
Number of pages6
ISBN (Electronic)9798350330717
ISBN (Print)9798350308136
DOIs
Publication statusPublished - 2023
EventInternational Conference on Parallel and Distributed Systems 2023 - Ocean Flower Island, Hainan, China
Duration: 17 Dec 202321 Dec 2023
Conference number: 29th
https://ieeexplore.ieee.org/xpl/conhome/10475889/proceeding (Proceedings)
https://ieee-cybermatics.org/2023/icpads/ (Website)

Conference

ConferenceInternational Conference on Parallel and Distributed Systems 2023
Abbreviated titleICPADS 2023
Country/TerritoryChina
CityHainan
Period17/12/2321/12/23
Internet address

Keywords

  • Big Data
  • Data Stream Grouping
  • Load Balance
  • Stateful Operator
  • Stream computing

Cite this