Identification of critical parameters for MapReduce energy efficiency using statistical design of experiments

Nidhi Tiwari, Umesh Bellur, Santonu Sarkar, Maria Indrawan

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    6 Citations (Scopus)


    Energy efficiency is an important concern for data centers today. Most of these data centers use MapReduce frameworks for big data processing. These frameworks and modern hardware provide the flexibility in form of parameters to manage the performance and energy consumption of system. However tuning these parameters such that it reduces energy consumption without impacting performance is challenging since - 1) there are a large number of parameters across the layers of frameworks, 2) impact of the parameters differ based on the workload characteristics, 3) the same parameter may have conflicting impacts on performance and energy and 4) parameters may have interaction effects. To streamline the parameter tuning, we present the systematic design of experiments to study the effects of different parameters on performance and energy consumption with a view to identify the most influential ones quickly and efficiently. The final goal is to use the identified parameters to build predictive models for tuning the environment. We perform a detailed analysis of the main and interaction effects of rationally selected parameters on performance and energy consumption for typical MapReduce workloads. Based on a relatively small number of experiments, we ascertain that replication-factor has highest impact and, surprisingly compression has least impact on the energy efficiency of MapReduce systems. Furthermore, from the results of factorial design we infer that the two-way interactions between block-size, Map-slots, and CPU-frequency, parameters of Hadoop platform have a high impact on energy efficiency of all types of workloads due to the distributed, parallel, pipe-lined design.

    Original languageEnglish
    Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
    Subtitle of host publication23–27 May 2016 Chicago, Illinois, USA
    Place of PublicationPiscataway, NJ
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Number of pages10
    ISBN (Electronic)9781509021406, 9781509036820
    Publication statusPublished - 18 Jul 2016
    EventWorkshop on High-Performance, Power-Aware Computing 2016 - Chicago Hyatt Regency, Chicago, United States of America
    Duration: 27 May 201627 May 2016
    Conference number: 12th


    WorkshopWorkshop on High-Performance, Power-Aware Computing 2016
    Abbreviated titleHPPAC 2016
    CountryUnited States of America
    Otherpart of the 30th IEEE International Parallel and Distributed Processing Symposium
    Internet address


    • Energy efficiency
    • MapReduce
    • Statistical Analysis

    Cite this