Combining visual and acoustic features for audio classification tasks.

Loris Nanni,Yandre M. G. Costa,Diego Rafael Lucio,Carlos Nascimento Silla Jr.,Sheryl Brahnam

Pattern Recognition Letters（2017）

引用 77|浏览58

暂无评分

摘要

Coupling texture descriptors and acoustic features.Different methods for representing an audio as an image are compared.Heterogeneous ensemble of different classifiers improves performance. Display Omitted In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into subwindows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MATLAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.dei.unipd.it/node/2357 +Pattern Recognition and Ensemble Classifiers).

查看译文

关键词

Audio classification,Texture,Image processing,Acoustic features,Ensemble of classifiers,Pattern recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要