Emulating Perceptual Evaluation of Voice Using Scattering Transform Based Features

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2022)

引用 2|浏览20
暂无评分
摘要
Voice health is traditionally assessed by methods that rely on the perception of a clinician, who integrates auditory and visual cues in order to reach a conclusion about the voice under evaluation. However, these tasks suffer from inter-professional variability due to its subjective nature, which is why more objective, computational-based methods are of interest. Two examples of such subjective tasks are the classification of voices in three types according to their periodicity, also termed voice typing, and the evaluation of six aspects of voice quality by means of the consensus auditory-perceptual evaluation of voice (CAPE-V) protocol. In this paper, two approaches to emulate each of those tasks are introduced, based on simple features extracted from scattering transform coefficients and support vector machines. Firstly, a system for automatic voice typing was trained and its classification performance was evaluated for intra and inter-dataset trials using two widely known corpora. Accuracies above 80%, comparable to the state-of-the-art, were found for all the experiments conducted. Secondly, a multidimensional, multioutput regression chain model was used to automatically grade the voice quality features of the CAPE-V protocol, obtaining errors and correlation coefficients that are comparable to those found for three human raters.
更多
查看译文
关键词
Task analysis, Protocols, Scattering, Perturbation methods, Transforms, Speech processing, Feature extraction, Total variation, scattering transform, support vector machines, voice quality, voice typing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要