Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models.

HCI (25)（2023）

引用 0|浏览2

暂无评分

摘要

Deep generative models have caused quite a stir due to their excellent performance in generating original images from different realms of the real world. An example of the application of these models is style transfer, where the style of one object is transferred to the content of another. In this study, an innovative proposal is made for transferring the multimodal style of songs to album covers, which consists of a pipeline structured in three parts. First, it is proposed to train a multimodal latent space from a triplet network model that receives a dataset of cover images and songs represented as spectrograms, around 18 genres. Then, with this latent space, the knn algorithm is computed, and the closest cover art to a query song is obtained. Finally, fine-tuning is performed on a pretrained Spectral Normalized GAN model on ImageNet, training only the batch parameters to avoid overfitting. And later, the original cover art is sampled. This way, the pipeline is executed for songs of 10 different genres, obtaining covers of similar genres in the 100 closest neighbors and obtaining images with an average Frechet Inception Distance of 20.89.

查看译文

关键词

bimodal style transference,musical composition,deep generative models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要