Abstract
Big data analytics refers to a simplified solution of managing huge datasets in a distributed computing environment. Apache Hadoop is an open-source solution that processes large datasets in parallel scenario. The ecosystem consists of three-tier architecture i.e. client, Namenode and Datanode. When a client requests a task processing, Namenode assigns resources and schedules the task over slave node. The Datanode processes the task and returns an output over heterogeneous storage-tier of the cluster. The dataset used for task processing can be categorized into four types i.e. Hot, Warm, Cold and Frozen data. The temperature feature indicates the frequency of accessibility to a dataset and facilitates storage areas i.e. DISK and ARCHIVE. Since the data temperature is identified through number of blocks used per day and data age i.e. 7 days, the storage-tier bears an additional task burden to monitor the changes. Moreover, the temperature data is not relocated but declared with a mark of DISK and ACHIVE proforma. Therefore, storage-tier suffers from data block contention and I/O accessibility latency issues. To overcome these issues, we propose Hit-Ratio Storage-tier Partition Strategy (HRSP), which create logical partitions i.e. Hot partition over a storage-tier media. The presented approach provides direct accessibility of temperature data and reduces I/O accessibility latency and block contention problem. The experimental results depict that the proposed approach disposes the concept of defragmentation and improves I/O accessibility of Hadoop cluster.
| Original language | English |
|---|---|
| Pages (from-to) | 2466-2472 |
| Number of pages | 7 |
| Journal | Journal of Theoretical and Applied Information Technology |
| Volume | 95 |
| Issue number | 11 |
| Publication status | Published - 2017 |
| Externally published | Yes |
Keywords
- Data temperature
- Hadoop
- HDFS
- Network contention
- Storage-tier
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver