A Perceptual Neural Audio Coder with a Mean-Scale Hyperprior

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览0
暂无评分
摘要
This paper proposes an end-to-end neural audio coder based on a mean-scale hyperprior model together with a perceptual optimization using a psychoacoustic model (PAM)-based loss function. The proposed coder estimates the mean and scale hyperpriors using a sub-network after assuming that the probability distribution of latent samples is Gaussian. The main network is an autoencoder based on Resnet-type gated linear units (ResGLUs), each comprising a generalized divisive normalization (GDN) layer. We train both networks to optimize perceptual attributes estimated using a multi-timescale scheme to obtain high perceptual quality. Experimental results show that the proposed model accurately predicts the mean and scale hyperpriors. Also, it obtains consistently higher audio quality than the commercial MP3 audio coder at all bitrates.
更多
查看译文
关键词
Neural Audio coder,Hyperprior,PAM,Perceptual Loss Function
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要