Skip to main navigation Skip to search Skip to main content

Hit-ratio storage-tier partition strategy of tempeature data over Hadoop

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Big data analytics refers to a simplified solution of managing huge datasets in a distributed computing environment. Apache Hadoop is an open-source solution that processes large datasets in parallel scenario. The ecosystem consists of three-tier architecture i.e. client, Namenode and Datanode. When a client requests a task processing, Namenode assigns resources and schedules the task over slave node. The Datanode processes the task and returns an output over heterogeneous storage-tier of the cluster. The dataset used for task processing can be categorized into four types i.e. Hot, Warm, Cold and Frozen data. The temperature feature indicates the frequency of accessibility to a dataset and facilitates storage areas i.e. DISK and ARCHIVE. Since the data temperature is identified through number of blocks used per day and data age i.e. 7 days, the storage-tier bears an additional task burden to monitor the changes. Moreover, the temperature data is not relocated but declared with a mark of DISK and ACHIVE proforma. Therefore, storage-tier suffers from data block contention and I/O accessibility latency issues. To overcome these issues, we propose Hit-Ratio Storage-tier Partition Strategy (HRSP), which create logical partitions i.e. Hot partition over a storage-tier media. The presented approach provides direct accessibility of temperature data and reduces I/O accessibility latency and block contention problem. The experimental results depict that the proposed approach disposes the concept of defragmentation and improves I/O accessibility of Hadoop cluster.

Original languageEnglish
Pages (from-to)2466-2472
Number of pages7
JournalJournal of Theoretical and Applied Information Technology
Volume95
Issue number11
Publication statusPublished - 2017
Externally publishedYes

Keywords

  • Data temperature
  • Hadoop
  • HDFS
  • Network contention
  • Storage-tier

Cite this