ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development

Ta Duc Huy,Nguyen Anh Tu, Tran Hoang Vu,Nguyen Phuc Minh,Nguyen Phan,Trung H. Bui,Steven Q. H. Truong

Communications in Computer and Information ScienceNeural Information Processing（2021）

引用 0|浏览7

暂无评分

摘要

Existing medical text datasets usually take the form of ques- tion and answer pairs that support the task of natural language gener- ation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag sets for two tasks are in medical domain and can facilitate the development of task- oriented healthcare chatbots with better comprehension of queries from patients. We train baseline models for the two tasks and propose a simple self-supervised training strategy with span-noise modelling that substan- tially improves the performance. Dataset and code will be published at https://github.com/tadeephuy/ViMQ

查看译文

关键词

NER, Intent classification, Medical question dataset, Self-supervised, Learning with noise

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要