ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development

Communications in Computer and Information ScienceNeural Information Processing(2021)

引用 0|浏览7
暂无评分
摘要
Existing medical text datasets usually take the form of ques- tion and answer pairs that support the task of natural language gener- ation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag sets for two tasks are in medical domain and can facilitate the development of task- oriented healthcare chatbots with better comprehension of queries from patients. We train baseline models for the two tasks and propose a simple self-supervised training strategy with span-noise modelling that substan- tially improves the performance. Dataset and code will be published at https://github.com/tadeephuy/ViMQ
更多
查看译文
关键词
NER, Intent classification, Medical question dataset, Self-supervised, Learning with noise
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要