Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks
710 likes | 894 Vues
Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks. A RESEARCH PROJECT Eduardo Dias Trama. Table of Contents. INTRODUCTION PROJECT OVERVIEW THE PREPROCESSOR THE LEARNING PROCESSOR THE SEPARATION PROCESSOR PROJECT EXPERIMENTS CONCLUSION. INTRODUCTION.
Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks
E N D
Presentation Transcript
Sound Source Separation using 3D Correlogram,Fuzzy Logic, and Neural Networks A RESEARCH PROJECT Eduardo Dias Trama
Table of Contents • INTRODUCTION • PROJECT OVERVIEW • THE PREPROCESSOR • THE LEARNING PROCESSOR • THE SEPARATION PROCESSOR • PROJECT EXPERIMENTS • CONCLUSION
INTRODUCTION • Overview of sound source separation • Sound separation methods • Related applications of sound separation
Overview of sound source separation • What is sound separation? • Psychoacoustic properties • Timbre • How can sound be modeled?
Sound separation methods • CASA (Computational Auditory scene Analysis), Marrian • Spatial and Periodicity-and-Harmonicity • CASA: 3D Correlogram analysis • Blind source separation and prediction-driven
Related applications of sound separation • Sound and voice recognition • Noise removal • Compression
PROJECT OVERVIEW • Overview • Auditory model analysis • Sound data library and classification • Sound data matching • Complete sound separation system
Overview • What is a piano sound? • Memory • Clustering
Auditory model analysis • Properties • Grouping • Past knowledge • Correlation
Sound data library and classification • Sound memory • How much information is needed for later analysis? • Does it matter if audio data is compressed? • Structure of classification
THE PREPROCESSOR • The Cochlea Filter Model • Correlogram • 3-D Correlogram
The Cochlea Filter Model • Filtering: basilar membrane (BM) • Detection: inner hair cell (IHC) • Compression: automatic gain control (AGC) • Cochleagram
Correlogram • Short time auto-correlations of the neural firing rates as a function of cochlear place (best frequency) versus time • Correlogram movie
Correlogram • Speech processing • Extract the formants of voiced and unvoiced sounds • Short duration • Auto-correlation window size Window size
Correlogram Frame • Vertical axis shows low to high frequencies from bottom to top • Horizontal axis represents the lag or time delay
Correlogram Frame • Dark areas in the image show activity in the Correlogram frame • Vertical lines: cochlear channels firing in the same period
Correlogram Frame • Horizontal bands are indicators of large amounts of energy within a frequency band
3-D Correlogram • A series of Correlograms over time • Frequency information comes from a cochlea filter bank • A finite time/frequency analysis • It depends on the initial time
THE LEARNING PROCESSOR • Creating the network input • Classification • Artificial neuron network fuzzy classification
Creating the network input • Responsible for learning each Correlogram frame of a selected sound • It should be exposed to many small variations of the target (selected) sound • The total number of neural nets (NN) is: NN = FB x CF
Class Family Length Frequency range Number of Correlogram frames Sufficient to classify one particular sound Make the matching process faster Intensive parallel processing Classification
Artificial neuron network fuzzy classification • Fuzzy IF-THEN rules to describe a classifier • An adaptive-network-based fuzzy classifier to solve fuzzy classification problems • ANFIS (adaptive-network-based fuzzy inference system)
THE SEPARATION PROCESSOR • Choosing method for sound matching • The Matching Fuzzy Logic sound library • Sound separation
Choosing method for sound matching • Preamble, search, matching and interpolation • Target and precision • Fuzzy clustering algorithms
The Matching Fuzzy Logic sound library • A set of fuzzy sound elements will be used for matching (FIS) • The initial values for search need to be determined by external inputs • ANFIS (Adaptive Neuro-Fuzzy Inference Systems)
Sound separation • Search, match and extract • Step 1: Input process • Step 2: Classification • Step 3: Choosing what to separate • Step 4: Dynamics and pitch extraction • Step 5: Re-synthesis
Step 1: Input process • Analog to digital conversion • Cochlea filter bank • Cochleagram • Correlogram frames • Neuro-Fuzzy input matrix
Step 3: Choosing what to separate • Rule 1: Assume that human auditory system can recognize one or more sounds from the audio input mixture • Rule 2: One recognizable audio should be selected for separation • Rule3: Assume that complete or partial information of selected audio class must exist in sound library
Step 5: Re-synthesis • Re-synthesis of selected sound Correlogram frames at unit pitch • Apply dynamics to each Correlogram frame • Correlogram frame inversion
PROJECT EXPERIMENTS • Experiment setup • Experiment procedures • Experiment results
Experiment procedures • Recorded wave data:5 sec. @ 44100 Hz sample rate, 16 bits resolution, and two channels (stereo) • Down-sampled to 11025 Hz to one channel • Mixed combinations without delay • Mixed combinations with 0.5 sec. delay
Experiment results • Single Sound Source • Two sound source without delay • Two sound source with delay • Modeling ANFIS for Correlogram frames • Correlogram frame channel training (classification) • Correlogram frame channel evaluation (matching)