Prediction of actions and places by the time series recognition from images with Multimodal LLM.

Tomohiro Ogawa, Kango Yoshioka,Ken Fukuda, Takeshi Morita

IEEE International Conference on Semantic Computing（2024）

引用 0|浏览0

暂无评分

摘要

In recent years, the risk of accidents in the homes of older adults in an aging society has increased, and there is a need to address this problem. We took up the challenge of utilising explainable AI techniques to identify accident risks at home and suggest safer alternatives. This study combined knowledge graphs and large-scale language models to solve real-world problems. Specifically, we addressed answering questions using a multimodal dataset of videos recording daily activities and a knowledge graph. The dataset represents the living activities in the virtual space and provides environmental information. The task is divided into two main tasks. Task 1 utilises knowledge graph to answer direct questions and processes the data using SPARQL queries. Task 2 addresses more complex questions that cannot be answered by search alone. Consequently, in Task 1, the system could answer all questions using information from the SPARQL knowledge graph. In Task 2, a certain degree of success was achieved for complex questions by reasoning with images created by concatenating multimodal LLMs and time-series images. The source code used in the experiment is available at https://github.com/tomo1115tomo/kg_reasoning_challenge.

查看译文

关键词

Knowledge Graph Reasoning Challenge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要