Soft Dynamic Time Warping with Variable Step Weights

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
In computer vision and audio processing, soft dynamic time warping (SDTW) techniques have been used as a differentiable loss function to train deep neural networks (DNNs) on weakly aligned data. In existing SDTW algorithms, the horizontal, vertical, and diagonal alignment steps all have the same weight, i.e., they contribute equally to the alignment cost. This equal weighting scheme for all step sizes can lead to degenerated alignments by, e.g., aligning most predictions to a single target frame in the early stages of training. Problems with equal step weights are known from classical DTW and have been addressed by assigning different weights to different step sizes. In this paper, we extend SDTW to allow for variable step weights and provide efficient dynamic programming algorithms for the forward and backward passes. As an example, we demonstrate the potential of the method on the task of training a DNN for pitch class estimation from music recordings, using step weight parameters that reduce the influence of outliers in repetitions of the same target frame.
更多
查看译文
关键词
soft dynamic time warping,step weights,pitch class estimation,music processing,music information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要