NCU-IISR/AS-GIS: Detecting Medication Names in Imbalanced Twitter Data with Pretrained Extractive QA Model and Data-Centric Approach

semanticscholar(2021)

引用 1|浏览4
暂无评分
摘要
—In this paper, we introduce our system for the BioCreative VII Track 3 Automatic extraction of medication names in tweets. Automatically extracting medication names from imbalanced data is challenging for deep learning models. Also, the length of the tweets is very short, which makes it hard to recognize medication names from the limited context. Here, our system combines classification and extractive question answering to solve the above problem. Moreover, domain-specific and task-specific pre-trained language models, as well as data-centric approaches are used to enhance our system. By combining the dictionary filtering and ensemble method, our system achieved 0.804 Strict F1 score far above the average performance 0.696 of 16 participating teams. Without using the dictionary and ensemble method, the single model we submitted achieved 0.797 Overlapping F1 which outperforms the result 0.773 of baseline system. Keywords—social media; medication detection; imbalanced data; text classification; data-centric; extractive question answering
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要