Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.

Zhou Yu,Jun Yu,Chenchao Xiang,Jianping Fan,Dacheng Tao

IEEE Transactions on Neural Networks and Learning Systems（2018）

引用 493|浏览209

暂无评分

摘要

Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex intera...

查看译文

关键词

Visualization,Task analysis,Feature extraction,Correlation,Natural languages,Computational modeling,Knowledge discovery

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要