Lazy exact deduplication

Jingwei Ma, Rebecca J. Stones, Yuxiang Ma, Jingui Wang, Junjie Ren, Gang Wang, Xiaoguang Liu

Research output: Chapter in Book/Report/Conference proceedingConference PaperOther

9 Citations (Scopus)

Abstract

During data deduplication, on-disk fingerprint lookups lead to high disk traffic, resulting in a bottleneck. In this paper, we propose a "lazy" data deduplication method which buffers incoming fingerprints and performs on-disk lookups in batches, aiming to reduce the disk bottleneck. In deduplication in general, prefetching is used to improve the cache hit rate by exploiting locality within the incoming fingerprint stream. For lazy deduplication, we design a buffering strategy that preserves locality in order to similarly facilitate prefetching. Experimental results indicate that the lazy method improves fingerprint identification performance by over 50% compared with an "eager" method with the same data layout.

Original languageEnglish
Title of host publication2016 32nd Symposium on Mass Storage Systems and Technologies (MSST 2016)
Subtitle of host publicationSanta Clara, California, USA, 2-6 May 2016
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages93-102
Number of pages10
ISBN (Electronic)9781467390552
ISBN (Print)9781467390569
DOIs
Publication statusPublished - 11 Apr 2017
Externally publishedYes
Event32nd Symposium on Mass Storage Systems and Technologies, MSST 2016 - Santa Clara, United States of America
Duration: 2 May 20166 May 2016

Conference

Conference32nd Symposium on Mass Storage Systems and Technologies, MSST 2016
Country/TerritoryUnited States of America
CitySanta Clara
Period2/05/166/05/16

Cite this