云计算技术研究中心

江松教授-RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems

日期:2019-06-20 类型: Academic 学术交流

RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems

Abstract:I/O deduplication is a key technique for improving storage systems' space and I/O efficiency.

Among various deduplication techniques content-defined chunking (CDC) based deduplication is the most desired one for its high deduplication ratio. However, its chunking operation is slow, and may become a performance bottleneck. Currently a choice has to be made between high deduplication ratio and high speed.

In this paper we leverage locality in the duplicate chunks to remove almost all chunking cost for deduplicatable chunks in CDC-based deduplication systems. The proposed deduplication method, named RapidCDC, has two salient features. One is that its efficiency is positively correlated to the deduplication ratio. The other feature is that its high efficiency does not heavily depend on the existence of the locality. Our experimental results with synthetic and real-world datasets show that RapidCDC’s  chunking  speedup  can  be  improved by up  to 33× over regular CDC. Meanwhile, it maintains the same deduplication ratio.