Speech Enhancement for Low Bit Rate Speech Codec

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 6|浏览28
暂无评分
摘要
Speech codec compresses the input signal into compact bit stream, which is then decoded at the receiver to generate the best possible perceptual quality. This compression makes storing and transmitting speech efficient. In this work, we propose a neural extension to low bit rate speech codec (e.g., Codec2) that aims to improve the perceptual quality of synthesized speech. Our proposed framework combines decoded audio with neural embeddings without breaking the existing speech coders. In addition to embeddings, we also use the least-square generative adversarial network (LSGAN) to reduce artifacts and prevent over-smoothing in the reconstructed audio. The Mean Opinion Scores (MOS) from the listening tests show that our framework can boost the audio quality of speech encoded at 3.6kbps to outperform that of speech encoded at 6kbps using Opus.
更多
查看译文
关键词
Speech Codec,VQ-VAE,generative adversarial network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要