Non-Verbal Vocalisation and Laughter Detection Using Sequence-to-Sequence Models and Multi-Label Training.

Interspeech(2021)

引用 1|浏览3
暂无评分
摘要
Non-verbal vocalisations (NVVs) such as laughter are an important part of communication in social interactions and carry important information about a speaker's state or intention. There remains no clear definition of NVVs and there is no clearly defined protocol for transcribing or detecting NVVs. As such, the standard approach has been to focus on detecting a single NVV such as laughter and map all other NVVs to an "other" class. In this paper we hypothesise that for this task such an approach hurts performance, and that giving more information by using more classes is beneficial. To address this, we present studies using sequence-to-sequence deep neural networks where we include multiple NVV classes rather than mapping them to "other" and allow more than one label per sample. We show that this approach yields better performance than the standard approach on NVV detection. We also evaluate the same model on laughter detection using frame-based and utterance-based metrics and show that the proposed approach yields state-of-the-art performance on the ICSI corpus.
更多
查看译文
关键词
computational paralinguistics,non-verbal vocalisation detection,laughter detection,multi-label training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要