Chinese Word Segmentation based on Word boundary Classificatioin

2022 7th International Conference on Signal and Image Processing (ICSIP)(2022)

引用 0|浏览1
暂无评分
摘要
As one of the basic tasks in natural language processing (NLP), Chinese word segmentation (CWS) is a necessary and key preprocessing step for many natural language processing tasks. The performance of CWS directly affects the final performance of subsequent tasks. Traditional word segmentation approaches mainly use conditional random field (CRF) decoder and BIO tags for sequence labeling, and the recent mainstream approaches normally employ the pre-trained bidirectional encoder representations from transformers (BERT) model as text encoder. This greatly affects the speed of word segmentation. Different from the existing approaches, this paper proposes a Chinese word segmentation approach based on word boundary classification, which utilizes a lite BERT (ALBERT) model as the text encoder. This approach directly determines whether the position between characters is the boundary between two potential words without using CRF decoding. Experimental results show that our proposed approach can achieve significant improvements, and significantly reduces the training and testing time of CWS.
更多
查看译文
关键词
Natural language processing,Chinese word segmentation,Word boundary classification,ALBERT training model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要