Height-bounded Lempel-Ziv encodings
arxiv(2024)
摘要
We introduce height-bounded LZ encodings (LZHB), a new family of compressed
representations that is a variant of Lempel-Ziv parsings with a focus on
allowing fast access to arbitrary positions of the text directly via the
compressed representation. Any LZHB encoding whose referencing height is
bounded by h allows access to an arbitrary position of the underlying text
using O(h) predecessor queries. We show that there exists a constant c such
that the size ẑ_𝐻𝐵(clog n) of the optimal (smallest) LZHB
encoding whose height is bounded by clog n for any string of length n is
O(ĝ_rl), where ĝ_rl is the size of the
smallest run-length grammar. Furthermore, we show that there exists a family of
strings such that ẑ_𝐻𝐵(clog n) = o(ĝ_rl),
thus making ẑ_𝐻𝐵(clog n) one of the smallest known
repetitiveness measures for which O(𝑝𝑜𝑙𝑦𝑙𝑜𝑔 n) time access is
possible using O(ẑ_𝐻𝐵(clog n)) space. While computing the
optimal LZHB representation for any given height seems difficult, we propose
linear and near linear time greedy algorithms which we show experimentally can
efficiently find small LZHB representations in practice.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要