Web-scale Knowledge Collection.

Colin Lockard,Prashant Shiralkar,Xin Luna Dong,Hannaneh Hajishirzi

WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining Houston TX USA February, 2020（2020）

引用 4|浏览140

暂无评分

摘要

How do we surface the large amount of information present in HTML documents on the Web, from news articles to scientific papers to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for Information Extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision. We cover the key ideas and intuition behind existing approaches to emphasize their applicability and potential in various settings.

查看译文

关键词

Information extraction, Web extraction, Semi-structured data, Web mining

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要