Image Captioning with Pretrained Language Generators

Saketh Vishnubhatla,Nishant Sinha

CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD)(2021)

引用 1|浏览3
暂无评分
摘要
We present a novel framework for image captioning combining scene graph models with pre-trained language models. This is in contrast to previous works, which largely rely on an encoder-decoder like architecture. In our experiments we use a two-stage pipeline: a) generating scene graphs from an image and b) using pre-trained language generators to obtain captions. Using scene graphs leads to more grounded captions and helps in exploiting the visual context between different objects in an image. By fine-tuning pretrained language models, we are able to reuse the vast, compressed knowledge in these models for image captioning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要