End-to-End Lyrics Recognition with Self-supervised Learning

Xiangyu Zhang,Shuyue Stella Li, Zhanhong He,Roberto Togneri,Leibny Paola Garcia

arxiv（2022）

引用 0|浏览7

暂无评分

摘要

Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of self-supervised learning models and conclude that the SSL models cannot extract features efficiently in the presence of background music. Finally, we study the out-of-domain generalization ability of the SSL features considering that those models were not trained on music datasets.

查看译文

关键词

learning,end-to-end,self-supervised

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要