Abstract
Current optimization for stateful flow processing computation tends to focus on load balancing without considering the utilization of downstream instance resources. To address this issue, we propose a data stream grouping method called Fa-Stream, specifically designed for stateful operators and incorporating field values frequency-awareness. Fa-Stream is implemented in three main aspects: (1) A data stream grouping model is built using Count-Min Sketch and Gated Recurrent Unit (GRU) to predict and analyze the frequency of field values. It selectively chooses high-frequency field values, and the communication distance model and instance resource constraint model are designed to adjust the weights of downstream instances for high-frequency field values. (2) A cyclic access routing table is generated, and weights are dynamically adjusted by a rebalancing scheme to avoid load skewness. Consistent hash grouping is implemented for low-frequency field values, and dual mapping is used to prevent large-scale migration caused by scaling. To validate the effectiveness of Fa-Stream, comparative experiments between Partial Key Grouping (PKG) and Fa-Stream are conducted using the Storm platform. Results demonstrate that Fa-Stream improves tuple throughput by 12.3%, reduces system delay by 14.2%, and increases load balancing degree by 42.8%. Furthermore, fa-Stream exhibits efficiency and stability across different data skews and tuple input rates.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE 29th International Conference on Parallel and Distributed Systems, ICPADS 2023 |
Editors | Jieming Yang, Shaojun Zou, Zhicai Zhang |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 126-131 |
Number of pages | 6 |
ISBN (Electronic) | 9798350330717 |
ISBN (Print) | 9798350308136 |
DOIs | |
Publication status | Published - 2023 |
Event | International Conference on Parallel and Distributed Systems 2023 - Ocean Flower Island, Hainan, China Duration: 17 Dec 2023 → 21 Dec 2023 Conference number: 29th https://ieeexplore.ieee.org/xpl/conhome/10475889/proceeding (Proceedings) https://ieee-cybermatics.org/2023/icpads/ (Website) |
Conference
Conference | International Conference on Parallel and Distributed Systems 2023 |
---|---|
Abbreviated title | ICPADS 2023 |
Country/Territory | China |
City | Hainan |
Period | 17/12/23 → 21/12/23 |
Internet address |
Keywords
- Big Data
- Data Stream Grouping
- Load Balance
- Stateful Operator
- Stream computing