Audio-based Eating Analysis and Tracking Utilising Deep Spectrum Features

2019 E-Health and Bioengineering Conference (EHB)(2019)

引用 0|浏览47
暂无评分
摘要
This This paper proposes a deep learning system for audio-based eating analysis on the ICMI 2018 Eating Analysis and Tracking (EAT) challenge corpus. We utilise Deep Spectrum features which are image classification convolutional neural network (CNN) descriptors. We extract the Deep Spectrum features by forwarding Mel-spectrograms from input audio through deep task-independent pre-trained CNNs, including AlexNet and VGG16. We then use the activations of first (fc6), second (fc7), and third (fc8) fully connected layers from these networks as feature vectors. We obtain the best classification result by using the first fully connected layer (fc6) of AlexNet for extracting the features from Mel-spectrograms with a window size of 160 ms and a hop size of 80 ms and a viridis colour map. Finally, we build Bag-of-Deep-Features (BoDF) which is the quantisation of the Deep Spectrum features. In comparison to the best baseline results on the test partitions of the Food Type and the Likability sub-challenges, unweighted average recall is increased from 67.2 percent to 79.9 percent and from 54.2 percent to 56.1 percent, respectively. For the test partition of the Difficulty sub-challenge the concordance correlation coefficient is increased from .506 to .509.
更多
查看译文
关键词
Deep Spectrum features,pre-trained convolutional neural networks,audio processing,eating analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要