Unifying frame rate and temporal dilations for improved remote pulse detection

Computer Vision and Image Understanding(2021)

引用 14|浏览39
暂无评分
摘要
Remote photoplethysmography (rPPG) is the monitoring of blood volume pulse from a camera at a distance. 3-Dimensional Convolutional Neural Networks (3DCNNs) have shown promising performance on the rPPG task, although it is critical that we understand the impact of both video and model parameters. In this paper, we explore the effect of frame rate, temporal kernel width, and – more generally – temporal receptive field on the reliability of heart rate and waveform estimation carried out by 3DCNNs. We train and evaluate 32 3DCNNs with different temporal parameters on a new large-scale database for physiological monitoring in an interview scenario. We show that previous studies reporting null effects of frame rate changes on pulse estimators may no longer be valid when using CNNs, and decreasing the frame rate may actually improve performance. In particular, we found that models trained on videos with frame rates as low as 12.9 frames per second (fps) perform better than those trained on videos recorded at a full 90 fps, perhaps due to the temporal receptive fields becoming larger in time dimension when the fps decreases. Using this insight, we propose RemotePulseNet, a novel 3DCNN architecture that exploits temporally dilated convolutions with increasing dilation rate to drastically increase the receptive field. We compare its performance with that of recent state-of-the-art pulse estimation methods, and show that both RemotePulseNet and the low frame rate 3DCNNs produce high-quality pulse signals from faces captured under a challenging interview scenario. The source code and instructions for obtaining a copy of the test data are made available with this paper.
更多
查看译文
关键词
41A05,41A10,65D05,65D17
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要