Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning

Joarder Mohammad Mustafa Kamal, Manzur Murshed, Mohamed Medhat Gaber

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    Abstract

    Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.

    Original languageEnglish
    Title of host publicationProceedings - 2014 International Symposium on Big Data Computing, BDC 2014
    EditorsIoan Raicu, Ilkay Altinas
    Place of PublicationUSA
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Pages8-15
    Number of pages8
    ISBN (Electronic)9781479918973
    DOIs
    Publication statusPublished - 5 Nov 2015
    EventIEEE/ACM International Symposium on Big Data Computing (BDC 2014) - Hilton London Paddington, London, United Kingdom
    Duration: 8 Dec 201411 Dec 2014
    http://www.cloudbus.org/bdc2014/

    Conference

    ConferenceIEEE/ACM International Symposium on Big Data Computing (BDC 2014)
    Abbreviated titleBDC 2015
    CountryUnited Kingdom
    CityLondon
    Period8/12/1411/12/14
    OtherIn conjunction with:
    7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014)
    Internet address

    Keywords

    • classification
    • Cloud databases
    • data migration
    • data stream mining
    • distributed transactions
    • incremental repartitioning
    • load-balance
    • workload

    Cite this

    Kamal, J. M. M., Murshed, M., & Gaber, M. M. (2015). Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning. In I. Raicu, & I. Altinas (Eds.), Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014 (pp. 8-15). [7321724] USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/BDC.2014.8
    Kamal, Joarder Mohammad Mustafa ; Murshed, Manzur ; Gaber, Mohamed Medhat. / Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning. Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014. editor / Ioan Raicu ; Ilkay Altinas. USA : IEEE, Institute of Electrical and Electronics Engineers, 2015. pp. 8-15
    @inproceedings{e006e20bf90c411cabce5e812588306c,
    title = "Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning",
    abstract = "Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.",
    keywords = "classification, Cloud databases, data migration, data stream mining, distributed transactions, incremental repartitioning, load-balance, workload",
    author = "Kamal, {Joarder Mohammad Mustafa} and Manzur Murshed and Gaber, {Mohamed Medhat}",
    year = "2015",
    month = "11",
    day = "5",
    doi = "10.1109/BDC.2014.8",
    language = "English",
    pages = "8--15",
    editor = "Ioan Raicu and Ilkay Altinas",
    booktitle = "Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",
    address = "United States of America",

    }

    Kamal, JMM, Murshed, M & Gaber, MM 2015, Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning. in I Raicu & I Altinas (eds), Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014., 7321724, IEEE, Institute of Electrical and Electronics Engineers, USA, pp. 8-15, IEEE/ACM International Symposium on Big Data Computing (BDC 2014), London, United Kingdom, 8/12/14. https://doi.org/10.1109/BDC.2014.8

    Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning. / Kamal, Joarder Mohammad Mustafa; Murshed, Manzur; Gaber, Mohamed Medhat.

    Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014. ed. / Ioan Raicu; Ilkay Altinas. USA : IEEE, Institute of Electrical and Electronics Engineers, 2015. p. 8-15 7321724.

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    TY - GEN

    T1 - Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning

    AU - Kamal, Joarder Mohammad Mustafa

    AU - Murshed, Manzur

    AU - Gaber, Mohamed Medhat

    PY - 2015/11/5

    Y1 - 2015/11/5

    N2 - Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.

    AB - Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.

    KW - classification

    KW - Cloud databases

    KW - data migration

    KW - data stream mining

    KW - distributed transactions

    KW - incremental repartitioning

    KW - load-balance

    KW - workload

    UR - http://www.scopus.com/inward/record.url?scp=84962910762&partnerID=8YFLogxK

    U2 - 10.1109/BDC.2014.8

    DO - 10.1109/BDC.2014.8

    M3 - Conference Paper

    AN - SCOPUS:84962910762

    SP - 8

    EP - 15

    BT - Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014

    A2 - Raicu, Ioan

    A2 - Altinas, Ilkay

    PB - IEEE, Institute of Electrical and Electronics Engineers

    CY - USA

    ER -

    Kamal JMM, Murshed M, Gaber MM. Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning. In Raicu I, Altinas I, editors, Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014. USA: IEEE, Institute of Electrical and Electronics Engineers. 2015. p. 8-15. 7321724 https://doi.org/10.1109/BDC.2014.8