1 / 1

SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION

SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanc i University, Istanbul, Turkey. INTRODUCTION.

conlan
Télécharger la présentation

SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPECTRO-TEMPORALPOST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and HakanErdogan SabanciUniversity, Istanbul, Turkey INTRODUCTION • We compare enforcing temporal smoothness by using post-smoothed spectral masks with enforcing smoothness by using regularized NMF. • The regularized NMF is defined as • Where Bd= [Bspeech, Bmusic], αis the regularization parameter, and R(G) is the continuity prior penalty term defined as: • Where • In this work, we choose different αs values for speech and αmfor music. • Table 1 shows the separation results using the regularized NMF to enforce smoothness on the estimated source signals. • Tables 2 and 3 show the separation results where the smoothness is enforced using smoothed spectral masks. • The tables show that, enforcing smoothness using smoothed masks gives better separation results than enforcing smoothness using regularized NMF. NONNEGATIVE MATRIX FACTORIZATION • Single-channel source separation aims to find estimates of source signals that are mixed when a single mixture is available. SIGNALSRECONSTRUCTIONAND SMOOTHED MASKS • NMF is used to decompose a nonnegative matrix V into a low rank nonnegative basis vectors matrix B and a nonnegative weights matrixG. • B and G can be found by minimizing the generalized Kullback-Leibler divergence • Subject to elements of . • The update solutions of B and G are • The initial estimates are used to build a spectral mask as • Changing p leads to different type of mask. • The spectral mask can be used to find estimate for each source by element-wise multiplication with the spectrogram of the mixed signal as. • To add temporal smoothness to the estimated source signal spectrograms, the spectral mask is smoothed by a 2-D smoothing filter with dimensions (a,b) as • The is a smoothing filter, which can be • The median filter. • The moving average low pass filter. • The Hamming windowed moving average filter (Hamming filter). • The smoothing direction is the horizontal (time) direction of the spectrograms. • The final estimate for each source can be found as 1 PROBLEM FORMULATION • The observed mixed signal x(t) is a mixture of multi-source signals sz(t). • This can be formed in the short time Fourier transform (STFT) domain as • This can be approximated as a sum of magnitude spectrograms as • The magnitude spectrograms can be written as nonnegative matrices as 2 NMF FOR SOURCE SEPARATION • In training stage: • Magnitude spectrogram of each source training data is used to build dictionary Bz for each source using NMF. • In testing stage: • NMF is used to decompose the magnitude spectrogram of the mixed signal X into a nonnegative weighted linear combinations of the trained dictionaries as • The initial estimate for each source is found as: 7 5 EXPERIMENTS AND RESULTS • The proposed algorithm is used to separate a speech signal from a background piano music signal. • For STFT, 512-point FFT, first 257 points are only used , the sampling rate is 16kHz. • We train 128 basis vectors for each source dictionary, so the size of each matrix B is 257x128. • WE CAN ADD SOMETHING HERE 6 3 4

More Related