Metadata Considered Harmful...to Deduplication.

HotStorage'15: Proceedings of the 7th USENIX Conference on Hot Topics in Storage and File Systems(2015)

引用 52|浏览247
暂无评分
摘要
Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata with data in ways that result in otherwise identical data being different. We examine three models for improving deduplication in the presence of embedded metadata: deduplication-friendly data formats, application-level post-processing, and format-aware deduplication. Working with realworld file formats and datasets, we find that by separating metadata from data, deduplication ratios are improved significantly--in some cases as dramatically as 5.6×.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要