Image Captioning with Pretrained Language Generators

Saketh Vishnubhatla,Nishant Sinha

CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD)（2021）

引用 1|浏览3

暂无评分

摘要

We present a novel framework for image captioning combining scene graph models with pre-trained language models. This is in contrast to previous works, which largely rely on an encoder-decoder like architecture. In our experiments we use a two-stage pipeline: a) generating scene graphs from an image and b) using pre-trained language generators to obtain captions. Using scene graphs leads to more grounded captions and helps in exploiting the visual context between different objects in an image. By fine-tuning pretrained language models, we are able to reuse the vast, compressed knowledge in these models for image captioning.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要