Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V
CoRR(2024)
摘要
Recent advancements in generative AI have suggested that by taking visual
prompt, GPT-4V can demonstrate significant proficiency in image recognition
task. Despite its impressive capabilities, the financial cost associated with
GPT-4V's inference presents a substantial barrier for its wide use. To address
this challenge, our work introduces Collage Prompting, a budget-friendly
prompting approach that concatenates multiple images into a single visual
input. With collage prompt, GPT-4V is able to perform image recognition on
several images simultaneously. Based on the observation that the accuracy of
GPT-4V's image recognition varies significantly with the order of images within
the collage prompt, our method further learns to optimize the arrangement of
images for maximum recognition accuracy. A graph predictor is trained to
indicate the accuracy of each collage prompt, then we propose an optimization
method to navigate the search space of possible image arrangements. Experiment
results across various datasets demonstrate the cost-efficiency score of
collage prompt is much larger than standard prompt. Additionally, collage
prompt with learned arrangement achieves clearly better accuracy than collage
prompt with random arrangement in GPT-4V's visual recognition.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要