Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition

arxiv(2023)

引用 3|浏览4
暂无评分
摘要
Purpose One of the recent advances in surgical AI is the recognition of surgical activities as triplets of ⟨ instrument, verb, target ⟩ . Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. Methods In this paper, we propose Rendezvous in Time (RiT) —a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. Results We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as ⟨ instrument, verb ⟩ . Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. Conclusion We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
更多
查看译文
关键词
Action triplet,Attention model,Laparoscopic surgery,Surgical triplet recognition,Temporal modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要