Real-Text Dictionary for Topic-Specific Web Searching.

Lecture Notes in Business Information Processing(2013)

引用 0|浏览11
暂无评分
摘要
We present a new type of dictionary that is intended as a search assistance in topic-specific Web searching. The method to construct the dictionary is a general method that can be applied to any reasonable topic. The first implementation deals with climate change. The dictionary contains real-text phrases (e.g. rising sea levels) in addition to the standard dictionary forms (sea-level rise). The phrases were extracted automatically from the pages dealing with climate change, and are thus known to appear in the pages discussing climate change issues when used as search terms. Different variant forms of the same phrase, such as sea-level rise, sea level rising, and rising sea level, are grouped together into the same synonym set using approximate string matching. Each phrase is assigned a frequency-based importance score (IS), which reflects the significance of the phrase in the context of climate change research. We investigate how effective the IS is for indicating the best phrase among synonymous phrases and for indicating effective phrases in general from the viewpoint of search results. The assumptions are that the best phrases have higher ISs than the other phrases of a synonym set, and that the higher the IS is the better the search results are. The experimental results confirmed these assumptions. This paper also describes the crawler used to fetch the source data for the climate change dictionary and discusses the benefits of using the dictionary in Web searching.
更多
查看译文
关键词
Dictionaries,Focused crawling,Query performance prediction,Searching,Vertical search engines,Web search engines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要