Violent Video Recognition by Using Sequential Image Collage

Yueh-Shen Tu, Yu-Shian Shen, Yuk Yii Chan,Lei Wang, Jenhui Chen

SENSORS(2024)

引用 0|浏览0
暂无评分
摘要
Identifying violent activities is important for ensuring the safety of society. Although the Transformer model contributes significantly to the field of behavior recognition, it often requires a substantial volume of data to perform well. Since existing datasets on violent behavior are currently lacking, it will be a challenge for Transformers to identify violent behavior with insufficient datasets. Additionally, Transformers are known to be computationally heavy and can sometimes overlook temporal features. To overcome these issues, an architecture named MLP-Mixer can be used to achieve comparable results with a smaller dataset. In this research, a special type of dataset to be fed into the MLP-Mixer called a sequential image collage (SIC) is proposed. This dataset is created by aggregating frames of video clips into image collages sequentially for the model to better understand the temporal features of violent behavior in videos. Three different public datasets, namely, the dataset of National Hockey League hockey fights, the dataset of smart-city CCTV violence detection, and the dataset of real-life violence situations were used to train the model. The results of the experiments proved that the model trained using the proposed SIC is capable of achieving high performance in violent behavior recognition with fewer parameters and FLOPs needed compared to other state-of-the-art models.
更多
查看译文
关键词
training,image recognition,neurons,computer architecture,multilayer perceptrons,Transformers,behavioral sciences
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要