Height-bounded Lempel-Ziv encodings

arxiv(2024)

引用 0|浏览1
暂无评分
摘要
We introduce height-bounded LZ encodings (LZHB), a new family of compressed representations that is a variant of Lempel-Ziv parsings with a focus on allowing fast access to arbitrary positions of the text directly via the compressed representation. Any LZHB encoding whose referencing height is bounded by h allows access to an arbitrary position of the underlying text using O(h) predecessor queries. We show that there exists a constant c such that the size ẑ_𝐻𝐵(clog n) of the optimal (smallest) LZHB encoding whose height is bounded by clog n for any string of length n is O(ĝ_rl), where ĝ_rl is the size of the smallest run-length grammar. Furthermore, we show that there exists a family of strings such that ẑ_𝐻𝐵(clog n) = o(ĝ_rl), thus making ẑ_𝐻𝐵(clog n) one of the smallest known repetitiveness measures for which O(𝑝𝑜𝑙𝑦𝑙𝑜𝑔 n) time access is possible using O(ẑ_𝐻𝐵(clog n)) space. While computing the optimal LZHB representation for any given height seems difficult, we propose linear and near linear time greedy algorithms which we show experimentally can efficiently find small LZHB representations in practice.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要