Maximum Margin Active Learning for Sequence Labeling with Different Length

ADVANCES IN DATA MINING, PROCEEDINGS: MEDICAL APPLICATIONS, E-COMMERCE, MARKETING, AND THEORETICAL ASPECTS(2008)

引用 8|浏览0
暂无评分
摘要
Sequence labeling problem is commonly encountered in many natural language and query processing tasks. SVMstructis a supervised learning algorithm that provides a flexible and effective way to solve this problem. However, a large amount of training examples is often required to train SVMstruct, which can be costly for many applications that generate long and complex sequence data. This paper proposes an active learning technique to select the most informative subset of unlabeled sequences for annotation by choosing sequences that have largest uncertainty in their prediction. A unique aspect of active learning for sequence labeling is that it should take into consideration the effort spent on labeling sequences, which depends on the sequence length. A new active learning technique is proposed to use dynamic programming to identify the best subset of sequences to be annotated, taking into account both the uncertainty and labeling effort. Experiment results show that our SVMstructactive learning technique can significantly reduce the number of sequences to be labeled while outperforming other existing techniques.
更多
查看译文
关键词
unlabeled sequence,existing technique,best subset,active learning,supervised learning algorithm,different length,sequence length,active learning technique,maximum margin,informative subset,new active learning technique,complex sequence data,natural language,uncertainty,supervised learning,natural language processing,support vector machine,sequence labeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要