Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.

IEEE Transactions on Neural Networks and Learning Systems(2018)

引用 493|浏览209
暂无评分
摘要
Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex intera...
更多
查看译文
关键词
Visualization,Task analysis,Feature extraction,Correlation,Natural languages,Computational modeling,Knowledge discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要