Arindam Jati 1 , Paula G. Williams 2 , Brian Baucom 2 , Panayiotis Georgiou 1

TOWARDS PREDICTING PHYSIOLOGY FROM SPEECH DURING STRESSFULCONVERSATIONS: HEART RATE AND RESPIRATORY SINUS ARRHYTHMIA Arindam Jati1, Paula G. Williams2, Brian Baucom2, Panayiotis Georgiou1 1University of Southern California, Department of Electrical Engineering, CA, USA 2The University of Utah, Department of Psychology, UT, USA

Relationship of stress, physiology and speech • Prediction of physiology from speech • Datasets • Methodology • Experiments and results • Summary Outline of the talk

Relationship of stress, physiology and speech • Effects of mental stress on physiology: • Excessive stress can lead to physiological, psychological, and psychosomatic health conditions such as anxiety and depression • Significant change in Heart Rate (HR) and HR Variability (HRV) due to mental stress • Can be utilized for measuring stress levels from these signals (Taelmanet. al.) • Occurrence of hyperventilation or over-breathing due to stress • Respiratory Sinus Arrhythmia (RSA) as a technique to asses stress • RSA: periodical alteration of heart rate in association with the phase of respiration (Paul Grossman) • Joachim Taelman et. al., “Influence of mental stress on heart rate and heart rate variability,” 4th European conference of the international federation for medical and biological engineering. Springer, 2009. • Paul Grossman, “Respiration, stress, and cardiovascular function,” Psychophysiology, vol. 20, no. 3, pp. 284–300, 1983.

Relationship of stress, physiology and speech • Stress detection from physiology: • Use of Electrodermal Activity (EDA), HR and HRV to detect stress • Problem: intrusive and in some cases invasive methods to acquire the physiological signals • Stress detection from speech: • A hefty amount of work using the SUSAS dataset (Zhou et. al.) • Nonlinear Teager Energy Operator (TEO) feature seemed to be useful • Benefit: Non-intrusive • Multimodal detection of stress (using both speech and galvanic skin response) • This work: • Predicts physiological signals indicative of stress directly from speech • Guojun Zhou, John HL Hansen, and James F Kaiser, “Nonlinear feature based classification of speech under stress,” IEEE Transactions on speech and audio processing, vol. 9, no. 3, pp.201–216, 2001.

Prediction of physiology from speech Effect of a psychological variable (mental stress) on two modalities: Mental Stress Well studied Well studied Physiology Speech Not well explored • Goal: • Explore relationship between physiology and speech (through studying correlations) • Predict physiology from speech during stressful conversations • Insights to learn from this study: • How a psychological variable (the reason of stress) can lead to both physiological and vocal activations, and how these two are related The insights can help developing future applications like – • Development of a multi-modal stress detection systems • Higher resolution quantitative metrics for the intensity of stress

Prediction of physiology from speech • Some previous works to predict physiology from speech: • HR from pronunciation of vowels (Skopinet. al.) • Schulleret. al.vowel pronunciation and reading a sentence loud with and without physical load • Recent study (Tsiartaset. al.) to classify change in the direction of HR from conversation with an artificial dialog system. • Their newer study (Smith et. al.): regression analysis to predict HR from audio. • D Skopin and S Baglikov, “Heartbeat feature extraction from vowel speech signal using 2D spectrum representation,” in Proc. of the 4th International Conference on Information Technology (ICIT), Amman, Jordan, 2009. • Bjorn Schuller, Felix Friedmann, and Florian Eyben, “Automatic recognition of physiological parameters in the human voice: Heart rate and skin conductance,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7219–7223. • Andreas Tsiartas, Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti, and Adrian Willoughby, “Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015. • Jennifer Smith, Andreas Tsiartas, Elizabeth Shriberg, Andreas Kathol, Adrian Willoughby, and Massimiliano de Zambotti, “Analysis and prediction of heart rate using speech features from natural speech,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 989–993.

What’s new? • Study: • Investigates relationship between acoustics and physiology (HR and RSA) • Analyzes correlation and regression performances • Analyzes stressful conversations between humans in two separate datasets • Main novelties and importance of this work: • To our knowledge, the first attempt to explore the relationship between the two modalities for real conversations between humans • To our knowledge, the first study to predict RSA from speech • Two very distinct datasets; addresses issues related to robustness of the acoustic features across different domains • Provides regression analysis on the actual value of the physiology; possibility of building an audio-based automated real-time stress or physiology monitoring system

Datasets † Number of unique males/females = 60 • For both datasets: multiple baseline physiology (resting HR and RSA) for every participant • Paula G Williams et. al., “The effects of poor sleep on cognitive, affective, and physiological responses to a laboratory stressor,” Annals of Behavioral Medicine, vol. 46, no. 1, pp. 40–51, 2013

Methodology speech Gender detection Diarization Denoising VAD couples’ interaction from CI dataset Average Physiology male female’s speech male’s speech Average Physiology female Session level acoustic features Session level acoustic features • For female, • Task 1:Analyze Pearson’s correlations between different acoustic features (i.e. elements of the feature vector) and. • Task 2: Predict from acoustic feature vector using nonlinear regression model. • Similarly for male, do the same analysis between and.

Methodology (contd.) • Session-level acousticfeatures: • 88 dimensional eGeMAPS features (Eybenet. al.) over the whole session using OpenSMILE toolkit • SI dataset: From participant’s speech over the entire session • CI dataset: Separately from husband and wife, separate analysis • Some examples (statistical functionals of some of them are also there) from eGeMAPS feature set: • frequency related parameters: pitch, jitter, and formant frequencies • energy/amplitude related parameters: shimmer, loudness, and harmonics to noise ratio • spectral balance parameters: alpha ratio, Hammarberg index, and harmonic differences • temporal features: rate of loudness, and mean length of voiced regions • cepstral features: MFCC and spectral flux • Florian Eyben, Klaus R Scherer, Bjorn W Schuller, Johan Sundberg, Elisabeth Andre, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al., “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.

Methodology (contd.) • Gender specific models: • Experimentally found to be more useful • Illustrated in the tSNE plot • Regression of model and parameters: • Estimating raw and normalized values of RSA and HR from acoustic features • Root Mean Squared Error (RMSE) loss • AdaBoost regressor with decision tree regressor as base estimator • 5-fold stratified Cross Validation (CV), with no session overlap between train, test and dev sets • 3-fold cross validation on train+dev set for model selection based on minimum dev set RMSE • The whole process of 5-fold CV repeated 5 times to get a better estimate of test error • Figure: t-SNE plot of the acoustic features on both datasets • clear clusters were formed within males and females from both datasets • possibly because of fundamental differences between some of the acoustics features (for example pitch) among men and women

Experiments and results • Analysis of two datasets separately: Pearson’s correlations (all statistically significant, i.e. p < 0.05) between the physiological variables (HR or RSA) and the best correlated feature • Observation: • Different features dominating across datasets for same physiology in same gender group • Possible reason: difference in the tasks (re-experienced in SI dataset vs. new stressors in CI dataset) the speakers are performing in the two datasets • Example: Different levels of emotional or vocal arousal in the two different tasks

Experiments and results(contd.) Combining two datasets: Pearson’s correlation (all statistically significant) between raw or normalized physiological variables (HR or RSA) and the best correlated feature • Observation: • Raw: Drop in the absolute correlation values by a large margin (from the values we obtained separately in two datasets) • Possible reason: different distributions of the physiological signals in the two datasets because of the inherent difference between the tasks the users are doing there • Normalized physiology (subtracting baseline physiology): boost in the correlations for most cases • except for RSA for female case

Experiments and results(contd.) Analysis of two datasets separately: RMSE for regressing rawphysiological variables • More data helps (CI bigger than SI) • The observed correlations and RMSE values (for the case of HR) align well with previous study by Schuller et. al. although the speaking tasks are very different

Experiments and results(contd.) RECAP: Analysis of two datasets separately Combining two datasets: RMSE for regressing raw or normalized physiological variables (HR or RSA) in combined dataset • Overall better performance for predicting raw values (even though correlations degraded) than what we obtained only on SI dataset • More data helps (CI + SI bigger than SI or CI)

Summary • A study to find relationship between acoustics and physiology during stressful conversations between humans • Key findings: • Gender specific models more useful than gender independent models • Different acoustic features highly correlates with physiology on different datasets, possibly because of difference in stress types: re-experienced vs. new stressors • Degradation in correlation in combined dataset, but per-speaker normalization of physiology helped • While regressing the physiological variables separately on two datasets, we observed overall better performance for the dataset with more participants

Conclusions • Conclusion: • Significant correlations agree with initial hypothesis that stress has an effect on both modalities, speech and physiology • The regression results support the need for more data • Normalization & gender specific models better • points to likely gains with individualized (or clustered) models • Future plans: • Investigate on finding better acoustic features • Deep learning models to exploit temporal pattern in the speech signal that could help us predicting physiological responses more accurately • Connection between physiology and acoustics can be studied without any human labeling

THANK YOUSignal processing for Communication Understanding and Behavior Analysis laboratory (scuba),University of Southern California (USC)http://scuba.usc.edu/

Arindam Jati 1 , Paula G. Williams 2 , Brian Baucom 2 , Panayiotis Georgiou 1

Arindam Jati 1 , Paula G. Williams 2 , Brian Baucom 2 , Panayiotis Georgiou 1

Presentation Transcript

1 1. 2. 3. 2 1. 2. ,,

,,: 1-1 :, 1-2 :, 2-1 :,, 2-2 :, 2-3 :,

2--1- 2- 2- -2-

: 1 : 2 : 1 : 2 : : 1 : 2 :

2 1. 1

G. Fiorentini 1 , C. Aliberti 2 , G. Benea 2 , G. Turrisi 1 , A. Mambrini 3 , I. Marri 2 , M. Tilli 2 .

2-1-1

1. 1 . 2 .

READING 1; GENESIS 1-1 – 2:2

1+1 = 2

Markus Geimer 2 ) , Bert Wesarg 1 ) , Brian Wylie 2)

Brian Williams

Molly Williams 2 , Nathaniel Stafford 1 and Robert Drugan 1

2-1-1

Jeanne Tropper 1 , Cindy Vinion 2 , Warren Williams 1

1 2 1 2