Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning

Joarder Mohammad Mustafa Kamal, Manzur Murshed, Mohamed Medhat Gaber

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch


    Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.

    Original languageEnglish
    Title of host publicationProceedings - 2014 International Symposium on Big Data Computing, BDC 2014
    EditorsIoan Raicu, Ilkay Altinas
    Place of PublicationUSA
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Number of pages8
    ISBN (Electronic)9781479918973
    Publication statusPublished - 5 Nov 2015
    EventIEEE/ACM International Symposium on Big Data Computing (BDC 2014) - Hilton London Paddington, London, United Kingdom
    Duration: 8 Dec 201411 Dec 2014


    ConferenceIEEE/ACM International Symposium on Big Data Computing (BDC 2014)
    Abbreviated titleBDC 2015
    CountryUnited Kingdom
    OtherIn conjunction with:
    7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014)
    Internet address


    • classification
    • Cloud databases
    • data migration
    • data stream mining
    • distributed transactions
    • incremental repartitioning
    • load-balance
    • workload

    Cite this