Method of constructing and identifying predictive models of human behavior based on information models of non-verbal signals

Mikhail Sinko, Anatoly Medvedev,Ivan Smirnov,Anastasia Laushkina, Aizhana Kadnova,Oleg Basov

Procedia Computer Science（2022）

引用 0|浏览4

暂无评分

摘要

On the one hand facial expression recognition (FER) of a human to determine his emotional state is a wellstudied task. On the other hand, most of the existing approaches are focused on recognition of basic emotions or binary labels of emotional tone (positive or negative). However, facial micro-expression (FMiE) recognition (FMiER) is an understudied but growing area. The interest of researchers in this area is due to the unsolved problems of determining the subtle intentions of humans in situations with high stakes, such as lie detection, state of mind recognition, negotiation techniques recognition, human behavior patterns prediction. In contrast to facial macro-expressions FMiEs are involuntary and transient facial expressions capable of revealing the genuine feelings that humans attempt to hide. These differences are reasons why FMiE detection relies heavily on expert experience and why FMiER is a difficult task which still requires additional research. In addition to FER, the human voice also contains a lot of information about the internal state of a human and speech emotion recognition is an actively studied task as well. In this paper we consider the problem of lie detection, however we soften it to the problem of assessing the confidence of a person in a video. We hypothesized that the use of information from two modalities and the joint exploring of this information through modalities can allow to predict the subtle feelings of a human with high accuracy. We introduce a solution that extracts and analyzes person's facial and speech features on a video sequence using attention mechanisms in the task of truth determination. Our approach uses state-of-the-art methods and achieves 84.91% accuracy. The survey also includes a comparison of various audio and visual features combinations and shows that our approach shows comparable accuracy-performance ratio.

查看译文

关键词

Video classification,truthfulness,transformers,facial landmarks,speech signal,audio analysis,video analytics,deep learnining

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要