Combining visual and acoustic features for audio classification tasks.

Pattern Recognition Letters(2017)

引用 77|浏览58
暂无评分
摘要
Coupling texture descriptors and acoustic features.Different methods for representing an audio as an image are compared.Heterogeneous ensemble of different classifiers improves performance. Display Omitted In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into subwindows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MATLAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.dei.unipd.it/node/2357 +Pattern Recognition and Ensemble Classifiers).
更多
查看译文
关键词
Audio classification,Texture,Image processing,Acoustic features,Ensemble of classifiers,Pattern recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要