PAM: Prompting Audio-Language Models for Audio Quality Assessment
CoRR(2024)
摘要
While audio quality is a key performance metric for various audio processing
tasks, including generative modeling, its objective measurement remains a
challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs
that may contain information about audio quality, the presence of artifacts, or
noise. Given an audio input and a text prompt related to quality, an ALM can be
used to calculate a similarity score between the two. Here, we exploit this
capability and introduce PAM, a no-reference metric for assessing audio quality
for different audio processing tasks. Contrary to other "reference-free"
metrics, PAM does not require computing embeddings on a reference dataset nor
training a task-specific model on a costly set of human listening scores. We
extensively evaluate the reliability of PAM against established metrics and
human listening scores on four tasks: text-to-audio (TTA), text-to-music
generation (TTM), text-to-speech (TTS), and deep noise suppression (DNS). We
perform multiple ablation studies with controlled distortions, in-the-wild
setups, and prompt choices. Our evaluation shows that PAM correlates well with
existing metrics and human listening scores. These results demonstrate the
potential of ALMs for computing a general-purpose audio quality metric.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要