Subtopic Segmentation of Scientific Texts: Parameter Optimisation

Communications in Computer and Information Science(2015)

引用 5|浏览1
暂无评分
摘要
Information research within a scientific text needs to deal with the problem of automatic document partition on subtopics by taking text specifics and user purposes into account. This task is important for primary source selection, for working with texts in foreign languages or for getting acquainted with research problems. This paper is focused on the application of subtopic segmentation algorithms to real-life scientific texts. For studying this we use monographs on the same subject written in three languages. The corpus includes several original and professionally trasnlated fragments. The research is based on the TextTiling algorithm that analyses how tightly adjoining parts of the text cohere. We examine how some parameters (the cutoff rate, the size of moving window and of the shift from one block to the next one) influence the segmentation quality and define the optimal combinations of these parameters for several languages. The studies on Russian suggest that external lexical resources notably improve the segmentation quality.
更多
查看译文
关键词
Text tiling,Classification,Parsing,Segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要