Auditory Cognition: Neural Convergence & Speech-EEG Correlation Study

Human Auditory Cognition2014 Alain de Cheveigné, Andre van Schaik, Chetan Singh Thakur, David Karpul, DorotheeArzounian, Edmund Lalor, Ernst Niebur, Giovanni Di Liberto, Guillaume Garreau, James O’Sullivan, Jessica Thompson, John Foxe, Lakshmi Krishnan, Malcolm Slaney, Manu Rastogi, Marcela Mendoza, Psyche Loui, Shih-Chii Liu, Simon Kelly, SiohoiIeng, Thomas Murray, Tobi Delbruck, Victor Benichoux, Victor Minces, VikramRamanarayanan, Yves Boubenec

Summary Telluride Experiments Wow!

Imagined (Ghosts) Expected (Priming) Noise Reconstructions Hardware Changes (Texture)

Ghosts

Ghosts – Motivation Visual: Auditory: ???

Ghost Simulation  Human Model Lure/Template Choice Domain SpectrogramCochleogram Comparison Approach Euclidean Cosine Xcorr AccumulationApproach Spectrogram EvaluationApproach Spectrogram Cochleagram NoiseInput White Pink Bable Cortical Noise 1 better subtract Noise 2 worse

Ghosts – Auditory Simulations Noise Samples Target

Ghosts – Simulation Results

Ghosts – Human Results Target/Lure (Superstition) Average Spectrogram of Positive Choices Output Inverse

Ghosts – Humans vs. Simulations p < 1e-4 (Yes similarity) – (No similarity)

Ghosts – EEG Course Time [ms] Word + noise Noise yes Noise no Difference (Yes – No)

Ghosts – EEG Hypothesis Stimulus Prime Estimate filter Choice prediction SUCCESS!!! (Not a guarantee)

Subject 1 Ghosts – EEG Model EEG Noise Subject 2 mTRF “Superstition” filter Filter estimation r1 User choice prediction Subject 3 r2 EEG prediction EEG Overall choice prediction accuracy 59% ± 1.5% (p < 0.05)

Ghosts – Summary • Auditory convergence! • EEG shows distinct filter!!

Textures

Textures – Question • How do statistics affect the neural accumulation of evidence? • Difficulty of the task, unpredictability Neural signature of accumulation of sensory evidence Connell et al., 2012

Textures – Stimulus Difficulty

Psychophysics Difficulty Late changes Early changes Timing of change (s)

Button press Difficulty

Average across subjects (n=3...) Voltage

Voltage

Priming

cheesy pretty ready cheesy cheesy Valid prime Target cheesy sunny ready sunny cheesy Invalid prime Target

cheesy cheesy Valid prime Target sunny cheesy Performance on the same stimuli is different as a function of sensory context Invalid prime Target

Priming – EEG Analysis The same stimuli elicit different cortical responses from auditory cortex as a function of sensory context

Priming Summary • Context changes selectivity of auditory cortex to modulate the responses to upcoming stimuli. • This is true for all 100 words! • Our ability to recognize the expected word is enhanced by this filter!

Reconstructions Envelopes vs. Onsets CCA DBN/NMF

History EEG But it misses all this:

Our brain likes onsets

EEG Predictions

Reconstructions – Relating EEG to Speech Goal: Find a transform of speech and a transform of EEG such that they are maximally correlated. This allows us to match EEG to speech (as in attention monitoring). - Better than correlating speech with reconstructed speech: speech contains details that do not show up in the EEG, so reconstructed speech is poorly correlated with real speech. - Better than correlating EEG with predicted EEG: EEG contains speech-irrelevant activity, so predicted EEG is poorly correlated with real EEG. cochlear filterbank CCA denoise, reduce dim. speech EEG measure correlation

Data: collected by Giovanni Di Liberto (Ed Lalor's lab). Stimulus is speech ("The old man and the sea), approx 1.6 hours in 47 files. EEG recorded from 8 subjects, 130 channels, SR=512 Hz, ~2-3 minute sessions Audio preprocessing: FFT-based cochlear filterbank, 40 channels, range 100-8000Hz, bandwidth = 1 ERB. Filter output instantaneous power is smoothed =~ 30ms, 4th root, SR=512 Hz  40 channel spectrogram (time series of "instantaneous partial loudness") EEG preprocessing: detrend (10th order polynomial), high-pass 0.1 Hz, low-pass 400 ms sq window, denoise with DSS to remove activity widely different between files Audio-EEG comparison: - concatenate 10 files (~20 min) - EEG: time shift 200ms x [0, 1 , 2 , 3], PCA, keep ~ 40 PCs - spectrogram: PCA, keep ~5 PCs - CCA between EEG and spectrogram PCs, - correlate 1st CCA component of EEG with 1st CCA component of spectrogram - test against surrogate data (spectrograms rotated by random amount, 100 trials)

Reconstructions – Results CorrelationbetweenEEG and audio transforms: r ~= 0.35 (0.12 on surrogate data) The first CCA component from EEG is our best estimate of activity in the brain that cares about continuous speech

TOWARD FINDING A LOW-DIMENSIONAL REPRESENTATION OF PHONETIC INFORMATION FROM SPEECH AND EEG SIGNALS SPEECH EEG NMF NMF, DBN LATENT REPRESENTATION PHONEMES

DATA PREPROCESSING 128-channel EEG 1–4Hz - Re-reference to mastoids - Bad channel rejection & interpolation 4–7Hz 7–15Hz 15–30Hz ICA Stability Analysis Equivalent Current Dipole Estimation

WHY NMF v. OTHER TECHNIQUES ? BASIS ACTIVATIONS NMF bases are more interpretable and part-based (since they are combine additively); at the same time not an over-approximation RECONSTRUCTED IMAGES

THE VISION Phone Labels Low Dimensional Shared Representation … Associative Layer … … Speech nth Hidden Layer EEG 3rd Hidden Layer Phone Label Layer EEG Transformation Speech Transformation Phone Transformation Phone 1st Hidden Layer EEG 1st Hidden Layer Speech 1st Hidden Layer Phone nth Hidden Layer EEG Visible Layer Speech Visible Layer EEG Features MFCC Features

IN PRACTICE, SO FAR… DBN System NMF System Trained on continuous speech (audiobook) SVM Phone Labels Phone Labels Activations (latent repn.) EEG Hidden Layers 80% – 10% – 10% train-dev-test split NMF EEG Visible Layer EEG Features EEG Features

Real-time multi-talker speech recognition using automated attention from the ITD information of a binaural silicon cochlea

Attending to Conversations 28 11 39 83 23 68 23 34 56 81 Where do I attend? ? ? ? ? ? ? ? Task: Recognize the highest valued (two digit) numbers

Cocktail Party: Salience vs. Attention (2011) Cognition Male/Female Salience Recognized Digits Attention ITD Histogram ASR Binaural Receiver Novelty/ Salience

Scene Analysis Engineering (2014) Cognition (Python) State: Direction to attend, Digits recognized Task: Switch attention based on recognition and saliency (Obama/Cameron) (digits) (Salience) Python (Sphinx) Binaural (jAER) Novelty (Python) UDP (sound samples) UDP (ITD)

Analog Scene Analysis – Things to solve • Saliency • Binaural onset • Online speech recognition • Difficulty of getting sounds into computer • Difficulty of interfacing to a real-time speech recognition toolbox • Cognition (held up by difficulty of the real-time speech recognition) • Two digit sentences, easy semantics • Sound Separation • Using delay-and-add to separate speakers

FPGA Cochlea

FPGA Results! • Implemented real-time FPGA implementation of Dick Lyon’s Cochlea model. • Implemented Shamma’s coherent sound-segregation task.

FPGA Cochlea Response of Chirp Signal

FPGA – Sound segregation problem • Temporal coherence -> Sound stream • Look for common modulation

2Hz Correlation matrix @ time t . . . . . . . . . 4Hz bm ihc 8Hz 16Hz Mask array Channel Stimulus Channel Attention Signal Cochlea Mask array Reconstructed tone

FPGA – Cochleagram and Modulation Output

Auditory Cognition: Neural Convergence & Speech-EEG Correlation Study