Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)
摘要
Multi-Source Diffusion Models (MSDM) allow for compositional musical
generation tasks: generating a set of coherent sources, creating
accompaniments, and performing source separation. Despite their versatility,
they require estimating the joint distribution over the sources, necessitating
pre-separated musical data, which is rarely available, and fixing the number
and type of sources at training time. This paper generalizes MSDM to arbitrary
time-domain diffusion models conditioned on text embeddings. These models do
not require separated data as they are trained on mixtures, can parameterize an
arbitrary number of sources, and allow for rich semantic control. We propose an
inference procedure enabling the coherent generation of sources and
accompaniments. Additionally, we adapt the Dirac separator of MSDM to perform
source separation. We experiment with diffusion models trained on Slakh2100 and
MTG-Jamendo, showcasing competitive generation and separation results in a
relaxed data setting.
更多查看译文
关键词
Music Generation,Diffusion Models,Source Separation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要