Chrome Extension
WeChat Mini Program
Use on ChatGLM

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

arXiv (Cornell University)(2023)

Cited 0|Views7
No score
Abstract
In low-bitrate speech coding, end-to-end speech coding networks aim to learn\ncompact yet expressive features and a powerful decoder in a single network. A\nchallenging problem as such results in unwelcome complexity increase and\ninferior speech quality. In this paper, we propose to separate the\nrepresentation learning and information reconstruction tasks. We leverage an\nend-to-end codec for learning low-dimensional discrete tokens and employ a\nlatent diffusion model to de-quantize coded features into a high-dimensional\ncontinuous space, relieving the decoder's burden of de-quantizing and\nupsampling. To mitigate the issue of over-smooth generation, we introduce\nmidway-infilling with less noise reduction and stronger conditioning. In\nablation studies, we investigate the hyperparameters for midway-infilling and\nlatent diffusion space with different dimensions. Subjective listening tests\nshow that our model outperforms the state-of-the-art at two low bitrates, 1.5\nand 3 kbps. Codes and samples of this work are available on our webpage.
More
Translated text
Key words
Speech Codec,Latent Diffusion Model,Speech Synthesis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined