Alignment of Image-Text and Video-Text Datasets.

Yunus Emre Özköse,Zeynep Gökce,Pinar Duygulu

SIU（2023）

引用 0|浏览11

暂无评分

摘要

In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.

查看译文

关键词

dataset alignment, deep learning, machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要