Minimally-Supervised Extraction of Entities from Text Advertisements.
HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics(2010)
摘要
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to partially and probabilistically label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these "constraints" into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier.
更多查看译文
关键词
ad creatives,small set,unlabeled ad,unlabeled data,constraint-based semi-supervised learning,entity extraction,semi-supervised solution,significant accuracy improvement,baseline dictionary approach,computational advertising task,Minimally-supervised extraction,text advertisement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络