Abstract
During data deduplication, on-disk fingerprint lookups lead to high disk traffic, resulting in a bottleneck. In this paper, we propose a "lazy" data deduplication method which buffers incoming fingerprints and performs on-disk lookups in batches, aiming to reduce the disk bottleneck. In deduplication in general, prefetching is used to improve the cache hit rate by exploiting locality within the incoming fingerprint stream. For lazy deduplication, we design a buffering strategy that preserves locality in order to similarly facilitate prefetching. Experimental results indicate that the lazy method improves fingerprint identification performance by over 50% compared with an "eager" method with the same data layout.
Original language | English |
---|---|
Title of host publication | 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST 2016) |
Subtitle of host publication | Santa Clara, California, USA, 2-6 May 2016 |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 93-102 |
Number of pages | 10 |
ISBN (Electronic) | 9781467390552 |
ISBN (Print) | 9781467390569 |
DOIs | |
Publication status | Published - 11 Apr 2017 |
Externally published | Yes |
Event | 32nd Symposium on Mass Storage Systems and Technologies, MSST 2016 - Santa Clara, United States of America Duration: 2 May 2016 → 6 May 2016 |
Conference
Conference | 32nd Symposium on Mass Storage Systems and Technologies, MSST 2016 |
---|---|
Country/Territory | United States of America |
City | Santa Clara |
Period | 2/05/16 → 6/05/16 |