The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

Frontiers in Neuroinformatics(2016)

引用 3|浏览0
暂无评分
摘要
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model.The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized.Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer’s disease research domain. Further, the effort in training the system for new datasets is also optimized.We are currently employing the GEM system to map Alzheimer’s disease datasets from around the globe into a common representation, as part of a global Alzheimer’s disease integrated data sharing and analysis network called GAAIN . GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
更多
查看译文
关键词
machine learning,Active Learning,Data mapping,data harmonization,Common data model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要