Automatic Detection of Pathological Voice Disorders

Automatic detection of pathological voice disordersFerenc Kazinczi1, Krisztina Mészáros2, and Klára Vicsi1 1 Laboratory of Speech Acoustics, Budapest University of Technology and Economics, Department of Telecommunications and Media Informatics 2Department of Head and Neck Surgery of the National Institute of Oncology, Budapest http://alpha.tmit.bme.hu/speech/ vicsi@tmit.bme.hu

Speech disorders: Mutations of voicegeneration organs: There is a close connection between the mutations of voice generation organs (differences in size, in tissue flexibility, etc.), and the measurable acoustical parameters(pitch, sound pressure, spectrum, etc.) of the generated speech product. Laryngeal cancer: Vocal cord polyp: Defect in the coordination of voicegeneration organs: Speech production is a complicated process involving coordination of several brain areas and peripheralmuscle controls. This is the reason why the acoustic-phonetic parameters of speech are sensitive to many neurological defects. The defects generally occur in areas of phonation, articulation, prosody and the fluency of speech.

The aim: • Separate healthy from pathological speech(fromvocaldisorders) automatically on the basis of continuous speech. • Identify the origin of the voice disorder – forexamplediagnosed functional dysphonia or recurrent paresis (paralizedvocalcord), ect. • Objectively measure the quality of the voice in comparison with the semi-subjective RBH scale • RBH scale is usedbydoctorstogivethelevelof severity((Rauhigkeit) (roughness) • (Behauchtkeit) (breathiness) • (Heiserkeit) (hoarseness)): • 0 = normal voice quality, 3 = heavy huskiness, applied by clinicians in daily routine.

Pathological and healthy speech databases 1 Sound recordings: in a consulting room at the Department of Head and Neck Surgery of the National Institute of Oncology. by near field microphone (Monacor ECM-100), 44,100Hz sampling rate, at a 16-bit linear coding. 100 healthy and 120 pathological speech samples Text: read out aloud the short story “The North Wind and the Sun” Diseases: functional dysphonia, recurrent paresis, tumors at various places of the vocal tract, reflux disease, chronic inflammation of larynx,bulbar pare-sis, its symptoms amyotrophic lateral sclerosis, leukoplakia, spasmodic dysphonia and glossectomy.

Pathological and healthy speech databases 2 • Annotation: the voice samples markedbythe RBH code, • four classes on the basis of subjectively felt parameters. • Severity: 100 Healthy (H0), 64 H1, 24 H2 and 32 H3. • Diseases:55 functional dysphonia (FD),33 recurrent paresis (RP), 32 others • Segmentation and labelling: onphonemelevel • all sounds [e] in the reading test were extracted; • in Hungarian language this vowel is the most frequent one and the [e] sounds are supposed to occur 50 times when reading the story.

Processing method Feature selection and classification process

Preprocessing - featureextractraction • All sounds [e] in the reading test were extracted. • At the middle of the [e] sound, the following acoustic parameters were measuredin 40 mstimewindow: Jitter (differentcalculationmethods): , , etc. Shimmer(differentcalculationmethods): , , etc. HNR 12 Mel-frequency Cepstral Coefficients (MFCC). This means an overall 37 features per sound per patient.

Parameter selectionbyPrincipal Component Analysis (PCA) Features with the highest eigenvector coefficients wereselected within the first 10 principal components, in each class in case of the sound”e”. FD: Functional Dysphone ; RP: Reccurent Paresis

Classification 2-class and multi-class Support Vector Machine (LibSVM and Matlab SVM functions) (SVM with Radial Base Function (RBF) kernel function, boxconstraint 0.2 and σ = 1.) Basic training set: 100 healthy and 120 pathological speech samples To validate the performance of the learning: Leave-One-Out Cross-Validation (LOOCV).

Classification of healthy and pathological voices based on RBH scale At the neighboring H parameters classification accuracy is below 70%. At the non neighboring H parameters classifi- cations are far higher, even reaching 91%.

Accuracy of healthy and pathological classification on different size of training sets The bigger database we have to train the classifier, the better accuracy we can provide for such a classification.

Classification of voice disorder based on diseases 55 functional dysphonia (FD),33 recurrent paresis (RP), 63 healthy two-class classification

Classification of voice disorder based on diseases 55 functional dysphonia (FD),33 recurrent paresis (RP), 63 healthy multi-class classification Confusion Matrix of jitter, shimmer and HNR based multi-class classification into H0,H1,H2 and H3with the number of each prediction

Conclusion It is possible - to automatically separate the healthy and pathological voices; - to automatically evaluate the RBH status; - to differentiate the origin of the disorders with a relatively high accuracy

Conclusion It is possible - to automatically separate the healthy and pathological voices; - to automatically evaluate the RBH status; - to differentiate the origin of the disorders with a relatively high accuracy The correct selection of the input features is essential and could highly affect the performance of the machine learning.

Conclusion It is possible - to automatically separate the healthy and pathological voices; - to automatically evaluate the RBH status; - to differentiate the origin of the disorders with a relatively high accuracy The correct selection of the input features is essential and could highly affect the performance of the machine learning. The importance of the size of the database is critical,

Conclusion It is possible - to automatically separate the healthy and pathological voices; - to automatically evaluate the RBH status; - to differentiate the origin of the disorders with a relatively high accuracy The correct selection of the input features is essential and could highly affect the performance of the machine learning. The importance of the size of the database is critical, and the accuracy of the SVM depend on the number of the elements in a given class.

Conclusion It is possible - to automatically separate the healthy and pathological voices; - to automatically evaluate the RBH status; - to differentiate the origin of the disorders with a relatively high accuracy The correct selection of the input features is essential and could highly affect the performance of the machine learning. The importance of the size of the database is critical, and the accuracy of the SVM depend on the number of the elements in a given class. So in the future it is still important to concentrate on not just developing better preprocessing and machine learning methods, but collecting more pathological data validated by professional phoniatrists

Thankyouforyourattention! vicsi@tmit.bme.hu

Automatic Detection of Pathological Voice Disorders

Automatic Detection of Pathological Voice Disorders

Presentation Transcript

Speech Assisted Radiology System for Retrieval, Reporting and Annotaiton

ELEMENTS OF DRAMA

The Parts of Speech

Cisco Vision for Unified Communications

The phonetics of speech errors