Learning Cross-Modal Embeddings For Cooking Recipes And Food Images

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)(2017)

引用 642|浏览334
暂无评分
摘要
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available(1).
更多
查看译文
关键词
learning cross-modal embeddings,structured corpus,recipe data,high-capacity models,multimodal data,neural network,joint embedding,high-level classification objective,cooking recipes,food images,Recipe1M,image-recipe retrieval task
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要