1 / 55

Human Auditory Cognition 2014

Human Auditory Cognition 2014.

elsie
Télécharger la présentation

Human Auditory Cognition 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Auditory Cognition2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, DorotheeArzounian, Edmund Lalor, Ernst Niebur, Giovanni Di Liberto, Guillaume Garreau, James O’Sullivan, Jessica Thompson, John Foxe, Lakshmi Krishnan, Malcolm Slaney, Manu Rastogi, Marcela Mendoza, Psyche Loui, Shih-Chii Liu, Simon Kelly, SiohoiIeng, Thomas Murray, Tobi Delbruck, Victor Benichoux, Victor Minces, VikramRamanarayanan, Yves Boubenec

  2. Summary Telluride Experiments Wow!

  3. Imagined (Ghosts) Expected (Priming) Noise Reconstructions Hardware Changes (Texture)

  4. Ghosts

  5. Ghosts – Motivation Visual: Auditory: ???

  6. Ghost Simulation  Human Model Lure/Template Choice Domain SpectrogramCochleogram Comparison Approach Euclidean Cosine Xcorr AccumulationApproach Spectrogram EvaluationApproach Spectrogram Cochleagram NoiseInput White Pink Bable Cortical Noise 1 better subtract Noise 2 worse

  7. Ghosts – Auditory Simulations Noise Samples Target

  8. Ghosts – Simulation Results

  9. Ghosts – Human Results Target/Lure (Superstition) Average Spectrogram of Positive Choices Output Inverse

  10. Ghosts – Humans vs. Simulations p < 1e-4 (Yes similarity) – (No similarity)

  11. Ghosts – EEG Course Time [ms] Word + noise Noise yes Noise no Difference (Yes – No)

  12. Ghosts – EEG Hypothesis Stimulus Prime Estimate filter Choice prediction SUCCESS!!! (Not a guarantee)

  13. Subject 1 Ghosts – EEG Model EEG Noise Subject 2 mTRF “Superstition” filter Filter estimation r1 User choice prediction Subject 3 r2 EEG prediction EEG Overall choice prediction accuracy 59% ± 1.5% (p < 0.05)

  14. Ghosts – Summary • Auditory convergence! • EEG shows distinct filter!!

  15. Textures

  16. Textures – Question • How do statistics affect the neural accumulation of evidence? • Difficulty of the task, unpredictability Neural signature of accumulation of sensory evidence Connell et al., 2012

  17. Textures – Stimulus Difficulty

  18. Psychophysics Difficulty Late changes Early changes Timing of change (s)

  19. Button press Difficulty

  20. Average across subjects (n=3...) Voltage

  21. Voltage

  22. Priming

  23. Priming

  24. cheesy pretty ready cheesy cheesy Valid prime Target cheesy sunny ready sunny cheesy Invalid prime Target

  25. cheesy cheesy Valid prime Target sunny cheesy Performance on the same stimuli is different as a function of sensory context Invalid prime Target

  26. Priming – EEG Analysis The same stimuli elicit different cortical responses from auditory cortex as a function of sensory context

  27. Priming Summary • Context changes selectivity of auditory cortex to modulate the responses to upcoming stimuli. • This is true for all 100 words! • Our ability to recognize the expected word is enhanced by this filter!

  28. Reconstructions Envelopes vs. Onsets CCA DBN/NMF

  29. History EEG But it misses all this:

  30. Our brain likes onsets

  31. EEG Predictions

  32. Reconstructions – Relating EEG to Speech Goal: Find a transform of speech and a transform of EEG such that they are maximally correlated. This allows us to match EEG to speech (as in attention monitoring). - Better than correlating speech with reconstructed speech: speech contains details that do not show up in the EEG, so reconstructed speech is poorly correlated with real speech. - Better than correlating EEG with predicted EEG: EEG contains speech-irrelevant activity, so predicted EEG is poorly correlated with real EEG. cochlear filterbank CCA denoise, reduce dim. speech EEG measure correlation

  33. Data: collected by Giovanni Di Liberto (Ed Lalor's lab). Stimulus is speech ("The old man and the sea), approx 1.6 hours in 47 files. EEG recorded from 8 subjects, 130 channels, SR=512 Hz, ~2-3 minute sessions Audio preprocessing: FFT-based cochlear filterbank, 40 channels, range 100-8000Hz, bandwidth = 1 ERB. Filter output instantaneous power is smoothed =~ 30ms, 4th root, SR=512 Hz  40 channel spectrogram (time series of "instantaneous partial loudness") EEG preprocessing: detrend (10th order polynomial), high-pass 0.1 Hz, low-pass 400 ms sq window, denoise with DSS to remove activity widely different between files Audio-EEG comparison: - concatenate 10 files (~20 min) - EEG: time shift 200ms x [0, 1 , 2 , 3], PCA, keep ~ 40 PCs - spectrogram: PCA, keep ~5 PCs - CCA between EEG and spectrogram PCs, - correlate 1st CCA component of EEG with 1st CCA component of spectrogram - test against surrogate data (spectrograms rotated by random amount, 100 trials)

  34. Reconstructions – Results CorrelationbetweenEEG and audio transforms: r ~= 0.35 (0.12 on surrogate data) The first CCA component from EEG is our best estimate of activity in the brain that cares about continuous speech

  35. TOWARD FINDING A LOW-DIMENSIONAL REPRESENTATION OF PHONETIC INFORMATION FROM SPEECH AND EEG SIGNALS SPEECH EEG NMF NMF, DBN LATENT REPRESENTATION PHONEMES

  36. DATA PREPROCESSING 128-channel EEG 1–4Hz - Re-reference to mastoids - Bad channel rejection & interpolation 4–7Hz 7–15Hz 15–30Hz ICA Stability Analysis Equivalent Current Dipole Estimation

  37. WHY NMF v. OTHER TECHNIQUES ? BASIS ACTIVATIONS NMF bases are more interpretable and part-based (since they are combine additively); at the same time not an over-approximation RECONSTRUCTED IMAGES

  38. THE VISION Phone Labels Low Dimensional Shared Representation … Associative Layer … … Speech nth Hidden Layer EEG 3rd Hidden Layer Phone Label Layer EEG Transformation Speech Transformation Phone Transformation Phone 1st Hidden Layer EEG 1st Hidden Layer Speech 1st Hidden Layer Phone nth Hidden Layer EEG Visible Layer Speech Visible Layer EEG Features MFCC Features

  39. IN PRACTICE, SO FAR… DBN System NMF System Trained on continuous speech (audiobook) SVM Phone Labels Phone Labels Activations (latent repn.) EEG Hidden Layers 80% – 10% – 10% train-dev-test split NMF EEG Visible Layer EEG Features EEG Features

  40. Real-time multi-talker speech recognition using automated attention from the ITD information of a binaural silicon cochlea

  41. Attending to Conversations 28 11 39 83 23 68 23 34 56 81 Where do I attend? ? ? ? ? ? ? ? Task: Recognize the highest valued (two digit) numbers

  42. Cocktail Party: Salience vs. Attention (2011) Cognition Male/Female Salience Recognized Digits Attention ITD Histogram ASR Binaural Receiver Novelty/ Salience

  43. Scene Analysis Engineering (2014) Cognition (Python) State: Direction to attend, Digits recognized Task: Switch attention based on recognition and saliency (Obama/Cameron) (digits) (Salience) Python (Sphinx) Binaural (jAER) Novelty (Python) UDP (sound samples) UDP (ITD)

  44. Analog Scene Analysis – Things to solve • Saliency • Binaural onset • Online speech recognition • Difficulty of getting sounds into computer • Difficulty of interfacing to a real-time speech recognition toolbox • Cognition (held up by difficulty of the real-time speech recognition) • Two digit sentences, easy semantics • Sound Separation • Using delay-and-add to separate speakers

  45. FPGA Cochlea

  46. FPGA Results! • Implemented real-time FPGA implementation of Dick Lyon’s Cochlea model. • Implemented Shamma’s coherent sound-segregation task.

  47. FPGA Cochlea Response of Chirp Signal

  48. FPGA – Sound segregation problem • Temporal coherence -> Sound stream • Look for common modulation

  49. 2Hz Correlation matrix @ time t . . . . . . . . . 4Hz bm ihc 8Hz 16Hz Mask array Channel Stimulus Channel Attention Signal Cochlea Mask array Reconstructed tone

  50. FPGA – Cochleagram and Modulation Output

More Related