Accurate Product Attribute Extraction on the Field

Martin Rezk, Laura Alonso Alemany,Lasguido Nio,Ted Zhang

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 17|浏览45
暂无评分
摘要
In this paper we present a bootstrapping approach for attribute value extraction that minimizes the need for human intervention. Our approach automatically extracts attribute names and values from semi-structured text, generates a small labelled dataset, and bootstraps it by extracting new values from unstructured text. It is domain/language-independent, relying only on existing semi-structured text to create the initial labeled dataset. We assess the impact of different machine learning approaches to increase precision of the core approach without compromising coverage. We perform an extensive evaluation using e-commerce product data across different categories in two languages and hundreds of thousands of product pages. We show that our approach provides high precision and good coverage. In addition, we study the impact of different methods that address specific sources of error. With error analysis we highlight how these methods complement each other, obtaining insights about the individual methods and the ensamble as a whole.
更多
查看译文
关键词
Semantics,Data mining,Business,Cleaning,Data models,Training,Taxonomy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要