An Efficient Technique for Mining Approximately Frequent Substring Patterns

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops(2007)

引用 18|浏览0
暂无评分
摘要
Sequential patterns are used to discover knowledge in a wide range of applications. However, in many scenar- ios pattern quality can be low, due to short lengths or low supports. Furthermore, for dense datasets such as proteins, most of the sequential pattern mining algorithms return a tremendously large number of patterns, which are difficult to process and analyze. However, by relaxing the defini- tion of frequency and allowing some mismatches, it is pos- sible to discover higher quality patterns. We call these pat- terns Frequent Approximate Substrings or FAS-patterns and we introduce an algorithm called FAS-Miner, to handle the mining task efficiently. The experiments on real-world pro- tein and DNA datasets show that FAS-Miner can discover patterns of much longer lengths and higher supports than standard sequential mining approaches.
更多
查看译文
关键词
sequential pattern,dense datasets,low support,efficient technique,higher support,standard sequential mining approach,ios pattern quality,mining task,frequent substring patterns,higher quality pattern,dna datasets show,sequential pattern mining algorithm,dna,knowledge discovery,sequential pattern mining,data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要