MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery
CoRR(2024)
摘要
Social media has become a ubiquitous tool for connecting with others, staying
updated with news, expressing opinions, and finding entertainment. However,
understanding the intention behind social media posts remains challenging due
to the implicitness of intentions in social media posts, the need for
cross-modality understanding of both text and images, and the presence of noisy
information such as hashtags, misspelled words, and complicated abbreviations.
To address these challenges, we present MIKO, a Multimodal Intention Kowledge
DistillatiOn framework that collaboratively leverages a Large Language Model
(LLM) and a Multimodal Large Language Model (MLLM) to uncover users'
intentions. Specifically, we use an MLLM to interpret the image and an LLM to
extract key information from the text and finally instruct the LLM again to
generate intentions. By applying MIKO to publicly available social media
datasets, we construct an intention knowledge base featuring 1,372K intentions
rooted in 137,287 posts. We conduct a two-stage annotation to verify the
quality of the generated knowledge and benchmark the performance of widely used
LLMs for intention generation. We further apply MIKO to a sarcasm detection
dataset and distill a student model to demonstrate the downstream benefits of
applying intention knowledge.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要