AIUnet: Asymptotic inference with U2-Net for referring image segmentation

Jiangquan Li, Shimin Shan,Yu Liu,Kaiping Xu,Xiwen Hu,Mingcheng Xue

PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023(2023)

引用 0|浏览8
暂无评分
摘要
Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.
更多
查看译文
关键词
Multimodal and crossmodal learning,Multimodal fusion,Human-robot/agent interaction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要