1 / 11

Prof. Hervé Bourlard

Idiap Research Institute Centre du Parc P.O Box 592 CH – 1920 Martigny +41 27 721 77 11 http://www.idiap.ch. Prof. Hervé Bourlard. Idiap Research Institute EPFL. Idiap Profile. Independent, not-for-profit research Institute. Founded in 1991 Around 100 collaborators (> 25 pays)

lucine
Télécharger la présentation

Prof. Hervé Bourlard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Idiap Research Institute Centre du Parc P.O Box 592 CH – 1920 Martigny +41 27 721 77 11 http://www.idiap.ch Prof. Hervé Bourlard Idiap Research Institute EPFL

  2. Idiap Profile Independent, not-for-profit research Institute • Founded in 1991 • Around 100 collaborators (> 25 pays) • Budget: around 10 MCHF • Centre du Parc in Martigny (2300 m2) • 37 research programs (>130 publications/year) • Affiliated with EPFL (joint development plan) and University of Geneva • Accredited (and co-funded) by the Federal Government, State and City, as part of the « ETH Strategic Domain » • Host institution of CH National Centre of Competence in Research on « interactive multimodal information management » (IM2)

  3. HUMAN AND MEDIA COMPUTING All details of current activities available at: http://www.idiap.ch/scientific-research/themes • Perceptual and cognitive systems • Speech processing • Document and text processing • Natural language understanding and translation • Vision and scene analysis • Multimodal processing • Computational cognitive science • Online learning & Categorization • Information interfaces and presentation • Multimedia information systems • User interfaces • System evaluation • Biometric person recognition • Speaker identification & verification • Face detection, tracking & recognition • Multimodal fusion • Social/human behavior • Web social media • Mobile social media • Social interaction sensing • Social signal processing • Verbal and nonverbal communication analysis • Machine learning • Statistical and neural network based ML (strong) • Computational efficiency, targeting real-time applications • Very large datasets • Online learning

  4. Activities in Perceptual and Cognitive Systems http://www.idiap.ch/scientific-research/themes/perceptual-and-cognitive-systems • Natural language understanding and translation • Semantic disambiguation using networks of concepts extracted from Wikipedia [started 2008] • Identification of discourse markers in dialogues [finished 2009] • Normalizing the evaluation of machine translation • Improving statistical machine translation using discourse-level information [Sinergia just accepted] • Multimodal object modeling • Semantic robot localization • Vision and scene analysis • Speech Processing (next slide)

  5. Activities in Speech Processing • Speech/non-speech detection (including approaches discarding all lexical and speaker ID information) • Speaker turn detection, segregation, and diarization • Based on acoustic features (new BIC, information bottleneck) • Based on sound source localization (mic array) • Based on both (fusion) • Speech localization, beamforming, overlapping and reverberant speech • Speaker identification • Conversational speech recognition • Improvement of the realtime Juicer LVCSR system, released as open source public library: http://juicer.amiproject.org/juicer/ • New acoustic features based on subword (phone) posterior distributions • New ways to use those posterior features • Extraction of audio metadata, dialog acts, hesitations, etc • HMM-based speech synthesis

  6. Template-based associative memories • PhD student: Serena Soldo • Perceptual studies on humans suggest: • Both verbal and non-verbal information are stored as template and used during speech recognition • Speech perception is usually explained in terms of associations to concept. • Project: • Jointly investigate the use of template-based approaches along with the application of associative memories techniques.

  7. Template-based recognition • Task • Isolated word recognition using Phonebook (PB) speech corpus • Posterior features estimated by MLP • MLP trained on PB • MLP trained on auxiliary corpus (Conversational Telephone Speech, CTS) • New type of template/HMM parametrized by posterior distributions • Investigated distance measure • Geometric measures (Euclidean distance, cosine angle) • Probabilistic measures (Kullback-Leibler divergence, Bhattacharya distance, Hellinger distance)‏ • Linguistic class based measure (scalar product, cross entropy)‏

  8. Some results • Although scalar product “theoretically optimal”, KL-based yield better performance. • Sufficient amount of training data from the auxiliary corpus can achieve comparable performance than the matched conditions‏. The amount of data also depends upon the choice of local score. Future work Continuing the work on template-based ASR and extending it towards the binary representation and the investigation of associative memory techniques.

  9. Sparse Component Analysis for Robust DSR • Distant Speech Recognition (DSR) difficulties • Overlapping speech • Reverberation • Sparse Component Analysis • Number of sensors < Number of speakers • The sparser the representation the more efficient the separation performance is expected to be • What is the best sparse representation? • Time frequency representations • Gabor features

  10. Sparse Component Analysis (SCA)‏ Speech Recognition Auditory Sparse Representation Distant Speech Recognition Front-End Auditory Sparsity and Sparse Component Analysis • Long term goal: Incorporating Auditory Sparsity in SCA • Gabor filtering of the spectro-temporal representation of speech • Deploy the detected Gabor patterns in blind source separation • So far: DUET algorithm

  11. Gabor-Posteriors Aurora2 Baseline DUET Clean Training 14.18 93.38 Multi-Con. Training 19.35 91.66 Degenerate Unmixing Estimation Technique (DUET)‏ • Clustering each source components based on delay and attenuation • and separation by masking in spectro-temporal domain • Synthesized stereo mixtures from Aurora2 • M1=S1 + S2 + S3 • M2=a1×S1 + a2×S2 + a3×S3 • a1 = 1/1.3, a2 = 1.3, a3 = 1.08/1.23

More Related