论文标题
带有布雷格曼分歧的阶段恢复进行音频源分离
Phase recovery with Bregman divergences for audio source separation
论文作者
论文摘要
通常通过估计每个源的短时傅立叶变换(STFT)幅度,然后应用相恢复算法来检索时间域信号来实现时频音频源分离。特别是,多个输入频谱反转(MISI)算法在最近的几项工作中表现出良好的性能。该算法最大程度地减少了幅度光谱图之间的二次重建误差。但是,这种损失不能正确地说明音频的某些感知属性,并且在许多情况下都首选了诸如β-差异之类的替代差异措施。在本文中,我们建议将音频源分离中的相恢复重新恢复为涉及布雷格曼分歧的最小化问题。为了优化所得目标,我们得出了一种投影的梯度下降算法。在语音增强任务上进行的实验表明,这种方法的表现优于几种替代性损失的MISI,这突出了它们与音频源分离应用的相关性。
Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevance for audio source separation applications.
