MITER: Medical Image-TExt joint adaptive pretRaining with multi-level contrastive learning

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览20
暂无评分
摘要
Recently multimodal medical pretraining models play a significant role in automatic medical image and text analysis that has wide social and economical impact in healthcare. Despite being able to be quickly transferred to downstream tasks, the models are greatly limited due to the fact that these models can only be pretrained with professional medical image-text datasets, which usually contain a very small number of samples. In this work we propose MITER (Medical Image-Text Joint adaptive Pretraining), a joint adaptive pretraining framework via multi-level contrastive learning to overcome this limitation by pretraining image and text models for medical domain and utilizing existing models pretrained on generic data, which contain enormous number of samples. MITER features two types of objectives to solve the problem. The first type is uni-modal objectives that pretrain the models with medical images and text separately on uni-modal tasks. The other type is a cross-modal objective that pretrains jointly, allowing the models to influence each other on cross-modal tasks. We also introduce a strategy to dynamically select hard negative samples during the training process for better performance. Experimental results over four medical tasks, image-report retrieval, multi-label image classification, visual question answering, and report generation, show that our MITER framework solves the limitation problem by greatly outperforming existing benchmark models on all the tasks. The source code of our framework is available online.2
更多
查看译文
关键词
Adaptive pretraining,Contrastive learning,Self-supervised learning,Cross-modal pretraining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要