Staged Training Strategy And Multi-Activation For Audio Tagging With Noisy And Sparse Multi-Label Data

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 0|浏览29
暂无评分
摘要
Audio tagging aims to predict whether certain acoustic events occur in the audio clips. Due to the difficulty and huge cost of obtaining manually labeled data with high confidence, researchers begin to focus on audio tagging using a small set of manually-labeled data, and a larger set of noisy-labeled data. In addition, audio tagging is a sparse multi-label classification task, where only a small number of acoustic events may occur in an audio clip. In this paper, we propose a staged training strategy to deal with the noisy label, and adopt a sigmoid-sparsemax multi-activation structure to deal with the sparse multi-label classification. This paper is an improvement and extension of our previous work for participation in Task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge. We evaluate our methods on the identical task, and achieve state-of-the-art performance, with an lwlrap score of 0.7591 on official evaluation dataset.
更多
查看译文
关键词
Audio tagging, noisy label, staged training strategy, multi-activation structure, DCASE2019 Challenge
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要