Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
CoRR(2024)
摘要
In this paper, we apply the variational information bottleneck approach to
end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This
allows us to investigate what information is essential for the model. EEND-EDA
utilizes vector representations of the speakers in a conversation - attractors.
Our analysis shows that, attractors do not necessarily have to contain speaker
characteristic information. On the other hand, giving the attractors more
freedom allowing them to encode some extra (possibly speaker-specific)
information leads to small but consistent diarization performance improvements.
Despite architectural differences in EEND systems, the notion of attractors and
frame embeddings is common to most of them and not specific to EEND-EDA. We
believe that the main conclusions of this work can apply to other variants of
EEND. Thus, we hope this paper will be a valuable contribution to guide the
community to make more informed decisions when designing new systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要