AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)
Key words
Audio-Visual Speech Recognition,Automatic Speech Recognition,End-to-End Speech Recognition,Acoustic Modeling,Environmental Sound Recognition
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined