Speech Enhancement for Low Bit Rate Speech Codec

Ju Lin,Kaustubh Kalgaonkar,Qing He,Xin Lei

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)（2022）

引用 6|浏览28

暂无评分

摘要

Speech codec compresses the input signal into compact bit stream, which is then decoded at the receiver to generate the best possible perceptual quality. This compression makes storing and transmitting speech efficient. In this work, we propose a neural extension to low bit rate speech codec (e.g., Codec2) that aims to improve the perceptual quality of synthesized speech. Our proposed framework combines decoded audio with neural embeddings without breaking the existing speech coders. In addition to embeddings, we also use the least-square generative adversarial network (LSGAN) to reduce artifacts and prevent over-smoothing in the reconstructed audio. The Mean Opinion Scores (MOS) from the listening tests show that our framework can boost the audio quality of speech encoded at 3.6kbps to outperform that of speech encoded at 6kbps using Opus.

查看译文

关键词

Speech Codec,VQ-VAE,generative adversarial network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要