1 / 35

METISS

METISS. Audio & speech processing. INRIA-Rennes. M odélisation et E xpérimentation pour le T raitement des I nformations et des S ignaux S onores. Scientific leader : Frédéric BIMBOT. Overview of activities 2002-2005. Introduction. Framework and foundations.

stuart
Télécharger la présentation

METISS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METISS Audio & speech processing INRIA-Rennes Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader : Frédéric BIMBOT Overview of activities 2002-2005 Evaluation INRIA

  2. Introduction Evaluation INRIA

  3. Framework and foundations • General framework • Scientific foundations • Probabilistic models and statistical estimation • Redundant systems and adaptive representations • analysis, processing • modelling, representation description, decomposition • detection, classification • recognition • audio • speech • music • multimedia • … • signals • recordings • streams • tracks • … of Audio scene analysis, description and recognition Evaluation INRIA

  4. Scientific objectives • to design generic, robust, fast and flexible approaches to a variety of problems in speech and audio segmentation, detection and classification, operating in the probabilistic framework • to investigate on theoretical properties and practical applications of adaptive representations and sparseness criteria with the purpose of advanced processing and structured description of audio signals • to extend and adapt approaches classically used in the context of speech processing to other classes of signals and problems • to study convergence between statistical approaches and adaptive decomposition within a common framework embedding signal representations and classification Evaluation INRIA

  5. Application domain and focus • Applicative fields • Security, verification, authentication, rights management • Rich audio transcription, content-based indexing, multi-purpose navigation, information retrieval and summarization • Advanced audio processing : segmentation, separation, spatialisation, sound object extraction, music modeling • Audio and audio-visual authoring, production and repurposing • Education and entertainement • Primary focuses • Speaker characterisation • Audio structuring and indexing • Sparse representations : theory and applications • Audio source separation (under-determined case) Evaluation INRIA

  6. BIMBOT GRAVIER GRIBONVAL POREE BETSER KIJAK KRSTULOVIC GONON BEN MORARU BLOUET BENAROYA MC DONAGH BEN COLLET LESAGE OZEROV SALL FORTHOFER HUET TENG ARBERET MAILHE Team composition 2005 2003 2002 2004 Permanent researchers (CR - CNRS or INRIA) 3 Non-permanent staff (Engineers, ATER, Post-Doc) 2 PhD ~ 50 % with METISS PhD - 100 % with METISS 2 3 + Marie-Noëlle Georgeault  administrative assistant (~ 25 %) Evaluation INRIA

  7. Probabilistic modeling of audio signals Evaluation INRIA

  8. Probabilistic modeling (1) 1 audio class or 1 sound object  a variety of observations 1 family of sounds  1 probabilistic model 1 probability density function 1 likelihood function Evaluation INRIA

  9. Probabilistic modeling (2) Probabilistic modeling Statistical estimation State-sequence decoding Bayesian decision + « know-how » Detection Classification Verification Segmentation …  Probabilistic models offer a well-understood generic inter-operable framework for the description and the classification of audio and speech signals • Dominant position of Hidden Markov Models (HMM) (and variants) • Highly competitive field in speech processing (research & industry) • More open in audio indexing (additional factors of complexity) Evaluation INRIA

  10. Challenges and positioning Generalisation to wider classesof signals with an audio component  multiple scales  multiple sources  multiple structures  multiple sensors  multiple levels of underlying processes  heterogeneous streams (audio-visual)  external sources of knowledge Robustness  to unseen acoustic conditions  to scarce training data  to poorly representative samples  to missing observations  to … Implementability  size  speed  scalability  distribution  etc … METISS positioning : - robust training and test methods - compact distributed algorithms - versatility / migration of formalism - methodology and evaluation  speaker verification  audio segmentation  broad sound-class indexing ( speech recognition) Evaluation INRIA

  11. Adaptive representations Evaluation INRIA

  12. Adaptive representations (1) Audio signal : • diversity of structures (time, frequency, statistics,…) • superimposition of objects (notes, sources, tracks, …) Redundant system (dictionary of atoms) Adaptive decomposition with • Selection of the« best » decomposition,according to a given criterion : • sparsity • perception criterion • separability • conditional entropy • … • Large set of vectors with various : • scales • time structures • frequency structures • phases • statistical properties • … Evaluation INRIA

  13. Constraint : Criterion : Adaptive representations (2) Sparsity criteria Decomposition  = 2 : quadratic norm  maximizes dispersion  = 0 : minimum non-zero coefficient  NP-complete  = 1 : tractable « compromise »  Pursuit algorithms (Matching Pursuit) Evaluation INRIA

  14. METISS positioning : - theoretical results - concepts and methodologies - decomposition algorithms  audio source separation (under-determined case) Ongoing scientific issues • Optimality and convergence of adaptive decompositions • Dictionary design (knowledge-based, data driven, …) • Deformable, stochastic, multi-dimensional, … atoms • Efficient decomposition algorithms and implementations • Application scope • Recent fast-growing field • High applicative potential • Intense emerging competition Evaluation INRIA

  15. Achievements2002-2005and selected results • Speaker characterisation • Audio structuring and indexing • Sparse representations : theory and applications • Audio source separation (under-determined case) Evaluation INRIA

  16. Speaker characterisation • CART trees for scalable and distributable speaker verification • Model-based metrics and normalisations for speaker verification • Structural adaptation of speaker models (hierarchical Bayesian networks) • Methodology and algorithms for optimizing the coverage of a speaker database • Relative speaker space and metrics for efficient speaker indexing and retrieval [ongoing] Evaluation INRIA

  17. CART based speaker verification Blouet, Bimbot, Gonon, et al. direct score function assignment  CART Trees used as a family of approximating functions -0.8 NO YES 0.7 NO 0.3 YES YES NO -0.4 -0.8 0.7 0.9 -0.4 NO + Extension to oblique trees YES -0.5 0.9 NO YES -0.5 0.3 complexity down 200 x error rate up 33% only EU-IST INSPIRED Project Evaluation INRIA

  18. Speaker recognition inthe model space (1) Ben, Bimbot et al. Formal links between LLR and KL-divergence + mean-only adaptation training procedure likelihood ratio test ~= Euclidean distance in the model space  Evaluation INRIA

  19. Tested successfully for speaker recognition for NIST and ESTER campaigns Speaker recognition inthe model space (2) Ben, Bimbot et al. Consequences : - faster score computation procedure (at least -50%) - simpler normalization schemes (M-Norm) no need of additional development data with no performance degradation Evaluation INRIA

  20. Audio indexing • HMM-based audio and audio-visual structuring (applied to sports programmes) • Audio segmentation and tracking using probabilistic models and statistical tests • Detection of simultaneous events in audio tracks • Granular models of audio signals using deformable atoms • Comparison and evaluation of beam-search techniques and hypothesis rescoring using external sources of knowledge [ongoing] • Algebraic representations and statistical modeling of formal music [ongoing] Evaluation INRIA

  21. Multi-stream HMM modeling (1)of a tennis match Kijak et al. (with TMM) multi-level state-sequence representation of a tennis match inspired and adapted from the speech recognition paradigms  multi-stream audio-visual HMM Evaluation INRIA

  22. Multi-stream HMM modeling (2) Delakis, Gravier et al. (with TexMex) segmental models  relaxed synchrony constraints Video+Audio Shot-based + segmental C = 85% Video-only Shot-based C = 77%  Evaluation INRIA

  23. Sparse representations • Mathematical test for the optimality of a sparse representation • Matching pursuit made tractable (1 hour  0.25 x RT) • Structured matching pursuit incorporating explicit signal family models • Adaptive computational strategies • Beyond sparsity : recovering structured representations… • Learning shift-invariant atoms (MoTIF algorithms) [ongoing] Evaluation INRIA

  24. Sparse solutions to inverse linear problems Gribonval et al. In the under-determined case : BUT if : If a sparse representation is sparse enough, then it is the sparsest one Evaluation INRIA

  25. Matching Pursuit made tractable Gribonval, Krstulovic et al. C++ ToolkitGPL Licence MPTK flexible operation reproducible results for a 1 hour audio signalprocessing time reduced from 20 h  0.25 h usable in other fields : medical signals, sismology, etc … Evaluation INRIA

  26. Source separation(with primary focus on undertermined problems) • Statistical schemes and adaptive training for single-channel separation • Source separation approaches using multi-channel Matching Pursuit in the underdetermined case • Contributions in evaluation methodology : task definition & performance measurements • Speech « denoising » using underdetermined sources separation techniques • Dictionary design methods for source separation [ongoing] • DEMIX : a robust algorithm to estimate the number of sources using clustering techniques [ongoing] Evaluation INRIA

  27. Single sensor audio source separation Observed signalVoice + Music Benaroya, Bimbot, Gribonval, Ozerov (with FTR&D) EstimatedVoice signal Factorial GMM Voice GMM Use of a factorial GMM to build a time-varying Wiener filter Music GMM Wiener filter Article in IEEE Trans SAP 2006 + new results to come • innovative scheme for underdetermined source separation • compatibility with speech processing state-of-the-art • strong links with sparse decomposition problems • versatile and efficient for a range of audio description tasks Evaluation INRIA

  28. Underdetermined stereophonicsource separation using sparse method Lesage, Gribonval et al. Mixing matrix Separation Audio examples available least squares  sparsity  Evaluation INRIA

  29. Collaborations, Disseminationand Visibility • Privileged cooperation with the TEXMEX group at IRISA (+ VISTA) • Consistent network of academic and industrial partners outside IRISA • Regular participation to collaborative projects (EU-IST, RNRT, bilateral partnership, …) • Strong involvement in concerted research actions (ESTER, MathSTIC, GDR-ISIS, NIST evaluations, …) • Visible participation to and production of free software : ELISA platform, AudioSeg, MPTK, SIROCCO, BSS-EVAL • Sustained effort of publication and dissemination of the group research results • Additional visibility through responsability taking in scientific societies, workshop organisation and editorial boards Evaluation INRIA

  30. Summary 2002-2005Strategy and perspectives2006-2010 Evaluation INRIA

  31. Achievements 2002-2005 (1) • solid contributions to the state-of-the art with respect to several topics related to speaker and audio class modelling and recognition • key extension, experimentation and validation of the Hidden Markov Model framework for joint audio and video modelling and structuring • major theoretical and experimental progress in the field of sparse representations and adaptive decomposition • pioneering work in mono- and multi-channel source separation in the underdetermined case Evaluation INRIA

  32. Achievements 2002-2005 (2) • strategic improvement in the efficiency of pursuit algorithms both in terms of search strategy and implementation • development of a usable know-how in keyword spotting and speech recognition • sustained activities in assessment methodology, resource distribution and evaluation campaigns • scientific objective #4 needs consolidation Evaluation INRIA

  33. Strategy 2006-2010 • To keep our position in our initial field of expertise :models, algorithms and tools for automatic processing of audio and speech signal • To push our advantage in the field of sparse representations, both from the theoretical and applicative viewpoint. • To extend our scope towards more powerful approaches for the representation and modeling of audio and multi-modal signals with an audio component • To step in and progress in the area of compressing large-scale high-dimensional multi-modal data Evaluation INRIA

  34. Scientific challenges • Probabilistic multi-level multi-stream dependency models for the representation of multiple sources and the integration of heterogeneous levels of knowledge in audio (-visual) streams Bayesian networks • Data-driven representations, model discovery and self-structuring of information in audio and audio-visual streams and contents theoretical consolidation • Experimental platforms and numerically efficient algorithms for large scale data and near real-time processing  engineering work • Deeper understanding of the links betweentheoretical concepts of adaptive representation, sparse decomposition, multi-scale analysis and pratical implications in terms of robustness, separability and adaptability potential links with SVM • Compressing large-scale high-dimensional multimodal data for storage, description and classification  compressed sensing Evaluation INRIA

  35. Questions Evaluation INRIA

More Related