Ultrasound speech analysis: State of the art

Ultrasound speech analysis:State of the art Alan Wrench

Overview • Machines • Methods of recording image sequences and syncing with audio • Probes • Head stabilisation • Contour tracking • Parameterisation Choosing an ultrasound : considerations • Physical: Size, weight, portability, fan noise – small and quiet is good • Probe design – low frequency, microconvex, short handled is good • Method of extracting images – ideally high quality, high frame rate, fast • Method of synchronising audio – ideally fully automated hardware frame sync • Cost – Low cost of course There is no single perfect system.

Overview • ~50 speech labs now using ultrasound • Aloka SSD 1000 -1 lab • Aloka SSD 4000 -1 lab • Aloka SSD 5000 -1 lab • Aloka SSD 5500 -1 lab • Mindray DP2200 - 6 labs • Mindray DP6600 - 15 labs • Mindray DP6900 - 1 lab • Mindray M5 - 1 lab • Echoblaster 128 - 3 labs • GE Logiq e - 3 labs • GE Logiq alpha 100 -1 lab • Interson SeeMore – 2 labs and British Columbia health service • Sonosite 180+ - 3 labs (being replaced) • Sonosite Titan – 3 labs • Terason T3000 – 7 labs • Ultrasonix RP, Tablet, Touch – 7 labs • Zonare Zone.1 - 1 lab • ~50 speech labs now using ultrasound • Aloka SSD 1000 -1 lab • Aloka SSD 4000 -1 lab • Aloka SSD 5000 -1 lab • Aloka SSD 5500 -1 lab • Mindray DP2200 - 6 labs • Mindray DP6600 - 15 labs • Mindray DP6900 - 1 lab • Mindray M5 - 1 lab • Echoblaster 128 - 3 labs • GE Logiq e - 3 labs • GE Logiq alpha 100 -1 lab • Interson SeeMore – 2 labs and British Columbia health service • Sonosite 180+ - 3 labs (being replaced) • Sonosite Titan – 3 labs • Terason T3000 – 7 labs • Toshiba Famio 8 (SSA-530A) – 1 lab • Ultrasonix RP, Tablet, Touch – 7 labs • Zonare Z.one - 1 lab

Recording ultrasound Acquiring image data via the Video Port (NTSC or digital) • Methods used • Frame grabber card and AAA software (Audio captured separately via soundcard and synced using a box that places a flash on the video and tone on the audio. Automatic post-processing to detect flash and tone and align – recording is fast but setting sync parameters can be a bit tricky • Canopus ADVC 110 video capture card. Provides integrated synchronous audio. Requires video editing software for capture such as Sony Vegas, Apple Final Cut Pro and iMovie, Avid Xpress DV. • Record to DVD recorder then transfer to PC offline. • Recording the screen using Snagit or Camtasia is an option for machines running under Windows such as Interson SeeMore. Although this is not using the video port it results in a video file. If data is not compressed then de-interlacing provides 60 frames per second. If compressed de-interlacing may not be possible

Things to look out for: (These factors can vary between individual models of ultrasound, even ones from the same manufacturer or if settings are changed.) • There may be a lag between the ultrasound and audio if the machine takes appreciable time to process the ultrasound signal. • There may be duplicate frames • There may be blurring if frame averaging cannot be switched off. • Video images may be “torn” when made of parts of different sweeps. • Careful selection of Ultrasound system can mitigate against these problems. • 25 labs using Aloka 1000,4000, Mindray 2200,6600,6900, GE LogiqE, Toshiba famio 8 use video port capture.

Recording Ultrasound • Cineloop – direct access to ultrasound memory • Advantage – No “torn” images. Frame rates higher than 60fps possible. • Disadvantage • Automatic audio synchronisation is not possible (with exceptions). Audio must be recorded separately, merged in video editing software and synchronised by manual observation of stop releases ( or a flash/beep signal ref., CHAUSA) • Cine loops have limited size. This limits record time. Sometimes this is a few seconds, sometimes it can be several minutes. (with exceptions) • This approach is used by 10 labs with • GE Logiq e • Zonare z.one • Mindary M5 • Sonosite 180+ • Sonosite Titan • Interson SeeMore

There are 4 systems in use which allow automated synchronisation of cineloop data and audio • Aloka SSD 5500 (Haskins) one off modification not generally available • Ultrasonix – both frame and scanline pulse sequences generated by hardware. • Terason T3000 – hardware sync signal not generally available so software sync used. Ultraspeech software polls system for new frames. • Echoblaster 128 - TTL frame pulses. • By recording these pulse signals on a second audio track alongside the microphone input, automated precise synchronisation is possible. • 16 labs use this method, using either Terason/UltraSpeech or Ultrasonix/AAA or Matlab • With the exception of the Aloka, these systems also provide software programming toolkits so that bespoke speech applications can be written: • UltraSpeech • AAA • Matlab

Probes Convex and particularly microconvex (<20mm radius) generally preferred for midsagittal tongue imaging Probes come in a range of frequencies from 2-12MHz • Low frequency = good penetration = tongue image doesn’t disappear for high vowels and consonants • Small radius means transmitting array fits under chin • Large Field of View means more of the tongue can be imaged. • Short handle Aloka UST-9121 Multi Frequency Tight Convex Transducer Scan angle: 120°Radius: 14 mmFrequency range: 2.5-6 MHz Short handle Narrow cylindrical grip ideal for a clamp.

Probe specifications

Probe stabilisation • Headset – 30+ labs • Rest forehead against headrest with probe in fixed position – 2 labs • Fixed head restraint and sprung-loaded probe • Fixed head restraint fixed probe

Head movement correction • Palatoglossatron, Peterotron, https://github.com/jjberry/Autotrace/blob/master/old/ APIL wiki ?? • HOCUS http://www.psych.mcgill.ca/labs/mcl/pdf/HOCUS.pdf • GIPSA accelerometers and gyrometers

Contour tracking • Edgetrack – Maryland – standalone PC application – Snakes http://vims.cis.udel.edu/~mli/research.htm • AAA – QMU – integrated PC application – fan based edge detection – similar performance to Edgetrak within a recording and analysis GUI. Also a snakes based contour fitting interface. • Tonguetrack – Simon Fraser – Matlab – MRF energy minimisation http://tonguetrack.cs.sfu.ca/TongueTrackUserGuide.pdf L. Tang and G. Hamarneh. Graph-based tracking of the tongue contour in ultrasound se-quences with adaptive temporal regularization. InMathematical Methods for BiomedicalImage Analysis (MMBIA), pages 1–8, 2010. • GetContours - Haskins – Matlab – Edgetrak with a GUI - available on request from Mark Tiede • Ultramat – Gipsa – Matlab – Thomas Hueber • Autotrace – Arizona – python script – Jeff Berry https://github.com/jjberry/Autotrace • Noname - Munich – Matlab – in progress – Phil Hoole • UltraPraat – Arizona – in progress • UltraCats – Toronto – manual contour drawing – Tim Bressman • Jacob - Rochester – Speckle tracking – software not available Jacob, M., H. Lehnert-LeHouillier, S. Bora, S. McAleavey, D. Dialecki, J. McDonough.2008. \Speckle Tracking for the Recovery of Displacement and Velocity Information fromSequences of Ultrasound Images of the Tongue".Proceedings of the 8th International Sem-inar on Speech Production, Strasbourg France, 53-57. • Roussos – UCL/Trier/Queen Mary - Active appearance models – software not available Roussos, A. Katsamanis, and P. Maragos, “Tongue tracking inultrasound images with active appearance models,” inProc. IEEEInt’l Conf. on Image Processing, 2009.

Speckle tracking • University of Rochester Biomedical Engineering • It provides displacement estimates giving “virtual fleshpoints” Works on clear vowel images.

Parameterisation • Lingua – Quebec – Matlab ISSP 2008 http://www.phonetique.uqam.ca/upload/files/anniebrasseur/menard%20et%20al%20issp2008.pdf • Zharkova – QMU – python Zharkova, N. (2013). A normative-speaker validation study of two indices developed to quantify tongue dorsum activity from midsagittal tongue shapes. Clinical Linguistics & Phonetics, 27, 484-496. • Hueber – GIPSA – Matlab – EigenTongues – Ultraspeech tools www.ultraspeech.com Also Hoole – Munich – Matlab - Principal components Analysis, Mielke NCSU, USA and Richmond, Edinburgh • NYU - SSANOVA using the gss package in R. • Haskins – shape analysis methods based on polynomial fitting and procrustes comparison to a resting tongue shape. • AAA – Tongue averaging – pointwise t-tests. • Surfaces - Displays a sequence of contours as a time-motion display. Contour sequences can be averaged and compared numerically.

Miscellaneous • Ultrasonix 4D – Haskins • GE Logiq – Linear probe – laryngeal – Victoria • EchoTools - A set of tools for analyzing Echo-Doppler tongue images https://github.com/jjberry/EchoTools

Ultrasound speech analysis: State of the art

Ultrasound speech analysis: State of the art

Presentation Transcript

Therapeutic Ultrasound

Ronald Reagan

Free Speech/1 st Amendment

An introduction to Endoscopic Ultrasound

Speech Recognition

Reconstructing Spontaneous Speech

SINUSOIDAL STEADY-STATE ANALYSIS

Endoluminal Ultrasound

Parts of Speech

FUNDAMENTAL ANALYSIS

Why Inner Speech?

Parts of Speech

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

Conditional Random Fields and Direct Decoding for Speech and Language Processing

Laryngeal Function and Speech Production

A Tutorial on Bayesian Speech Feature Enhancement

Chapter 8 Frequency-Domain Analysis

Chapter 5 Analysis of CCS

Piezoelectric Micromachined Ultrasound Transducers ( pMUTs )

Feature Extraction for speech applications

Medical Physics Ultrasound

Conditional Random Fields for Automatic Speech Recognition