Efficiently Testing Simon'S Congruence

38TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2021)(2021)

引用 14|浏览47
暂无评分
摘要
Simon's congruence similar to(k) is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The similar to(k)-relation is defined as follows: two words are similar to(k)-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s similar to(k) t. We propose the first algorithm solving this problem in linear time O(vertical bar s vertical bar + vertical bar t vertical bar) when the input words are over the integer alphabet {1, ... , vertical bar s vertical bar + vertical bar t vertical bar} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well.To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by similar to(k) on the set of suffixes of a word, for all k >= 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(vertical bar s vertical bar + vertical bar t vertical bar), allows us to retrieve the largest k for which s similar to(k) t.
更多
查看译文
关键词
Simon's congruence, Subsequence, Scattered factor, Efficient algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要