1 / 15

Speech Enhancement Based on Nonparametric Factor Analysis

Speech Enhancement Based on Nonparametric Factor Analysis. Lin Li 1 , Jiawen Wu 1 , Xinghao Ding 1 , Qingyang Hong 1 , Delu Zeng 2 1 School of Information Science and Technology, Xiamen University, China 2 School of Mathematics, South China University of Technology, China. Reporter: Jiawen Wu.

gastonj
Télécharger la présentation

Speech Enhancement Based on Nonparametric Factor Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Enhancement Based on Nonparametric Factor Analysis Lin Li1, Jiawen Wu1, Xinghao Ding1, Qingyang Hong1, Delu Zeng2 1School of Information Science and Technology, Xiamen University, China 2School of Mathematics, South China University of Technology, China Reporter: Jiawen Wu 10/11/2016

  2. Speech Enhancement Based on Non- parametric Factor Analysis • Background of the Research • The Proposed Method • Experiment Setup • Experiment Results Outline

  3. Background SS[Boll79] Spectral Subtraction Subspace[Moor93] MMSE NPS[Cohen03] Minimum Mean-square Error Algorithm Using a Non-causal Priori SNR MMSE MAP[Paliwal12] maximum a posterior estimator of magnitude-squared spectrum Speech Enhancement Sparse Representation K-SVD: K-singular value decomposition[Zhao11] CLSMD: constrained low-rank and sparse matrix decomposition[Sun14] Wiener Filtering[Scalart96] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 27, no. 2, pp. 113–120, 1979. B. De Moor, “The singular value decomposition and long and short spaces of noisy matrices,” Signal Processing, IEEE Trans-actions on, vol. 41, no. 9, pp. 2826–2838, 1993. I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” Speech and Audio Processing, IEEE Transactions on, vol. 11, no. 5, pp. 466–475, 2003. K. Paliwal, B. Schwerin et al., “Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator,” Speech Communication, vol. 54, no. 2, pp. 282–305, 2012. P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,” ICASSP1996. pp. 629–632. N. Zhao, X. Xu, and Y. Yang, “Sparse representations for speech enhancement,” Chinese Journal of Electronics, vol. 19, no. 2, pp. 268–272, 2011. C. Sun, Q. Zhu, and M. Wan, “A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition,” Speech Communication, vol. 60, pp. 44–55, 2014.

  4. The Proposed Method A sparse representation framework with a nonparametric dictionary learning model based on beta process factor analysis

  5. Contributions Nonparametric The average sparsity level of the representation and the dictionary size could be learned by using a beta process. 1 The noise variance is not required The noise variance can be inferred automatically after analytical posterior calculation. 2 An in situ training process An in situ way of speech processing is provided, in which we do not have to train the dictionary beforehand. 3

  6. Problem formulation

  7. K-SVD[1] threshold σ Sparsity Level L [1] N. Zhao, X. Xu, and Y. Yang, “Sparse representations for speech enhancement,” Chinese Journal of Electronics, vol. 19, no. 2, pp. 268–272, 2011.

  8. Prior: Architecture (1) (2) (3) (4) (5) (6) Posterior: Via variational Bayesian[Paisley09] or Gibbs-sampling analysis,a full posterior density function can be inferred for the update of D and α, accompanied with all other model parameters. (7) (8) J. Paisley and L. Carin, “Nonparametric factor analysis with beta process priors,” in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, pp. 777–784.

  9. Setup(1)---Parameter P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013. Y. Hu and P. C. Loizou, “Subjective comparison and evaluation of speech enhancement algorithms,” Speech communication, vol. 49, no. 7, pp. 588–601, 2007. P. Recommendation, “862: Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Feb, vol. 14, pp. 14–0, 2001.

  10. Setup(2) Iteration=100 The posterior: P is the frame number of the input speech The output SNR values vs iteration with different b0. Input speech: the text “the birch canoe slid on the smooth planks”, corrupted with the street noise at 0dB

  11. Setup(2) An extra handling Yes No It remains a great challenge to be further investigated, since the output SNR is unavailable in practical applications. b0=N/9 changed b0 to a larger number e.g., 1000×P Whether the output SNR declines for ten times continuously? No change

  12. Results Comparison with K-SVD (a) PESQ (b) SegSNR Noise type: Gaussian white noise SegSNR / PESQ: Mean values calculated using the 30 utterances at eachinput SNR Match: The noise variance estimation for K-SVD matches the ground truth. Mismatch: The noise variance estimation for K-SVD doesn’t matches the ground truth.

  13. Results Statistics of nonparametric dictionary learning (a)sorted final probabilities of dictionary elements (πks); (b)distribution of the number of elements used per frame. Input utterances: “we talked of the sideshow in the circus” (“sp19.wav” ) with input SNR at 0dB

  14. Results

  15. Thanks!

More Related