Noise and Echo Control for Immersive Voice Communication in Spacesuits

Presented as a keynote speech on the International Workshop on Acoustic Echo and Noise Control (IWAENC) in Tel Aviv, Israel on September 2, 2010 Noise and Echo Control forImmersive Voice Communication in Spacesuits Yiteng (Arden) Huang WeVoice, Inc., Bridgewater, New Jersey, USA arden_huang@ieee.org 9/2/2010

About the Project • Financially sponsored by the NASA SBIR (Small Business Innovation Research) program • Phase I feasibility research: Jan. 2008 – July 2008 • Phase II prototype development: Jan. 2009 – Jan. 2011 • Other team members: • Jingdong Chen, WeVoice, Inc., Bridgewater, New Jersey, USA • Scott Sands, NASA Glenn Research Center (GRC), Cleveland, Ohio, USA • Jacob Benesty, University of Quebec, Montreal, Quebec, Canada

Outline • Problem Identification and Research Motivation • Problem Analysis and Technical Challenges • Noise Control with Microphone Arrays • Hardware Development • Software Development • A Portable, Real-Time Demonstration System • Towards Immersive Voice Communication in Spacesuits

Section 1 • Problem Identification and Research Motivation • Problem Analysis and Technical Challenges • Noise Control with Microphone Arrays • Hardware Development • Software Development • A Portable, Real-Time Demonstration System • Towards Immersive Voice Communication in Spacesuits

Requirements of In-Suit Audio • Speech Quality and Intelligibility: • 90% word identification rate • Hearing Protection: • Limits total noise dose, hazard noise, and on-orbit continuous and impulse noise for waking and sleeping periods • Noise loads are very high during launch and orbital maneuvers. • Audio Control and Interfaces: • Provides manual silencing features and volume controls • Operation at Non-Standard Barometric Pressure Levels (BPLs): • Operates effectively between 30 kPa and 105 kPa

Current In-Suit Audio System Current Solution: Communication Carrier Assembly (CCA) Audio System Skullcap Perspiration Absorption Area Earpiece Helmet Helmet Ring Chin Cup Microphone Module Microphone Boom

Interconnect wiring Nylon/spondex top Teflon sidepiece and pocket Ear cup Electret Microphone Ear seal Interface cable and connector Electret Microphone Source: O. Sands, NASA GRC Extravehicular Mobility Unit (EMU) CCA • For shuttle and International Space Station (ISS) operations • A large gain applied to the outbound speech for sufficient sound volume at low static pressure levels (30 kPa) leads to clipping and strong distortion during operations near sea-level BPL.

Dynamic Microphones Source: O. Sands, NASA GRC Advanced Crew Escape Suit (ACES) CCA • For shuttle launch and entry operations • Hearing protection provided by the ACES CCA may not be sufficient.

Developmental CCA Ear Cups Source: O. Sands, NASA GRC Source: O. Sands, NASA GRC Noise Canceling Microphones Active In-Canal Earpieces • The active earpieces will be used in conjunction with the CCA ear cups during launch and other high noise events and can be removed for other suited operations. • The active earpieces alone nearly provide the required level of hearing protection.

CCA Systems: Pros • High outbound speech intelligibility and quality, SNR near optimum • Use close-talking microphones • A high degree of acoustic isolation between the in-suit noise and the suit subject’s vocalizations • A high degree of acoustic isolation between the inbound and outbound signals • The human body does NOT transmit vibration-borne noise • Provide very good hearing protection.

CCA Systems: Cons • The microphones need to be close to the mouth of a suited subject. • A number of recognized logistical issues and inconveniences: • Cannot adjust the cap and the microphone booms during EVA operations, which can last from 4 to 8 hours • The close-talking microphones interfere with the suited subject’s eating and drinking, and are susceptible to contamination. • The communication cap needs to fit well. Caps in a variety of different sizes need to be built and maintained, e.g., 5 sizes for EMU caps. • Wire fatigue for the microphone booms • These problems cannot be resolved with incremental improvements to the basic design of the CCA systems.

Stakeholder Interviews • The CCA ear cups produce pressure points that cause discomfort. • Microphone arrays and helmet speakers are suggested to be used. • Suit subject comfort should be maximized as much as possible, given that other constraints can be met (relaxed and traded off): • Clear two-way voice communications • Hearing protection from the fan noise in the life support system ventilation loop • Properly containing and managing hair and sweat inside the helmet • Adequate SNR for the potential use of automatic speech recognition for the suit’s information system

Two Alternative Architectural Options for In-Suit Audio • Integrated Audio (IA): Instead of being housed in a separate subassembly, both the microphones and the speakers are integrated into the suit/helmet. • Hybrid Approach: Employs the inbound portion of a CCA system with the outbound portion of an IA system. Helmet Speaker

Noise from Outside the Spacesuit • During launch, entry descent, and landing: • Impulse noise < 140 dBSPL, Hazard noise < 105 dBA • On orbit: • Impulse noise: < 140 dBSPL waking hours and < 83 dBSPL sleeping • Limits on continuous on-orbit noise levels by frequency: • Remark: During EVA operations, ambient noise is at most a minor problem.

Structure-Borne Noise Inside the Spacesuit • Four noise sources (Begault & Hieronymus 2007): • Airflow and air inlet hissing noise, as well as fan/pump noise due to required air supply and circulation • Arm, leg, and hip bearing noise • Suit-impact noise, e.g., footfall • Swishing-like noise due to air movement caused by walking (since the suits are closed pressure environments) • For CCA systems, since the suit subject’s body does not transmit bearing and impact noise, only airflow-related noise needs to be controlled. • For Integrated Audio (IA) systems, microphones are mounted directly on the suit structure and vibration noise is loud.

Acoustic Challenges • Complicated noise field: • Temporal domain: Has both stationary and non-stationary noise • Spectral domain: Inherently wideband • Spatial domain: Near field; Possibly either directional or dispersive • Highly reverberant enclosure: • The helmet is made of highly reflective materials. • Strong reverberation dramatically reduces the intelligibility of speech uttered by the suit subject and degrades the performance of an automatic speech recognizer. • Strong reverberation leads to a more dispersive noise field, which makes beamforming less effective.

4 3 2 1 5 Proposed Noise Control Scheme for IA/Hybrid Systems Head Position Calibration Head Motion Tracker Mouth range and incident angle with respect to the microphone array Acoustic Source Localization Microphone Array Beamforming Single Channel Noise Reduction Multichannel Noise Reduction Adaptive Noise Cancellation Outbound Speech Noise Reference

4 3 2 1 Current Research Focus Microphone Array Beamforming Single Channel Noise Reduction Multichannel Noise Reduction Outbound Speech

Near-Field Sound Source S(f, rs) N 1 2 N 2 1 Beamforming: Far-Field vs. Near-Field S(f, θ) Far-Field Sound Source of Interest Far-Field Noise Far-Field Noise Plane Waves V(f, ψ) … ψ … V(f, ψ) … rs (N-1)·d·cos(ψ) Plane Waves θ . . . d θ . . . ψ XN(f) X2(f) X1(f) ... d XN(f) X2(f) X1(f) ... ...    hN h2 h1 ...    hN h2 h1 Σ Σ Y(f, ψ, rs) Y(f, ψ, θ)

Fixed Beamformer vs. Adaptive Beamformer Microphone Array Beamformers Noise Field? Stationary, Known before the design Time Varying, Unknown Isotropic noise generally assumed Fixed Beamformers Adaptive Beamformers Reverberation? Not Concerned Significant LCMV (Frost)/GSC Delay-and-Sum Filter-and-Sum MVDR (Capon) Delay-and-Sum • Simple • Non-uniform directional responses over a wide spectrum of frequencies Filter-and-Sum • Complicated • Uniform directional responses over a wide spectrum of frequencies: good for wideband signals, like speech MVDR (Capon) • Only the TDOAs of the interested speech source need to be known – simple requirements. • Reverberation causes the signal cancellation problem. • Time-domain or frequency-domain LCMV (Frost)/GSC • The impulse responses (IRs) from the source to the microphones have to be known or estimated. • Errors in the IRs lead to the signal cancellation problem.

Comments on Traditional Microphone Array Beamforming • For incoherent noise sources, the gain in SNR is low if the number of microphones is small. • For coherent noise sources whose directions are different from that of the speech source, a theoretically optimal gain in SNR can be high but is difficult to obtain due to a number of practical limitations: • Unavailability of precise a priori knowledge of the acoustic impulse responses from the speech sources to the microphones. • Inconsistent responses of the microphones across the array. • For coherent noise sources that are in the same direction as the speech source, beamforming (as a spatial filter) is ineffective.

Signal Model: Speech Source of Interest s(k) s(k) Noise v(k) v(k) Impulse Responses . . . . . . Knowledge related to the source position or gn gN g2 g1 gN g2 g1 2 N 1 N 2 . . . . . . d ... 1 ... xN(k) x2(k) x1(k) xN(k) x2(k) x1(k) Beamforming MCNR Dereverberation and Denoising s(k) x1,s(k) Only Denoising • Beamformer: Spatial Filtering • Array Setup: Calibration is necessary – possibly time/effort consuming • MCNR: Statistical Filtering • Array Setup: No need to strictly demand a specific array geometry/pattern Multichannel Noise Reduction (MCNR) • A conceptual comparison of beamforming and MCNR:

Frequency-Domain MVDR Filter for MCNR • The problem formulation: • The MVDR filter: • A more practical implementation: where • Similar to traditional single-channel noise reduction methods, the noise PSD matrix is estimated during silent periods and the signal PSD matrix is estimated during speech periods.

MVDR for Beamforming (BF): • MVDR for MCNR: • The acoustic impulse responses can at best be estimated up to a scale: where denotes the true response vector. • Note: In the implementation of the MVDR-MCNR, the channel responses do not need to be known. Leads to speech distortion. Comparison of the MVDR Filters for Beamforming and MCNR

Distortionless Multichannel Wiener Filter for MCNR • Use what we called the spatial prediction: • Formulate the following optimization problem: where • The distortionless multichannel Wiener (DW) filter for MCNR: • The optimal Wiener solution for the non-causal spatial prediction filters: where So, • It was found that

Single-Channel Noise Reduction (SCNR) for Post-Filtering • Beamforming: The Wiener filter (the optimal solution in the MMSE sense) can be factorized as MVDR Beamformer Wiener Filter for SCNR Note: For a complete and detailed development of this factorization, please refer to Eq. (3.19) of the following book. • M. Brandstein and D. Ward, eds, Microphone Arrays: Signal Processing Techniques and Applications, Berlin, Germany: Sprinter, 2001. • MCNR: Again, the Wiener filter can be factorized as MVDR for MCNR Wiener Filter for SCNR Note: For a complete and detailed development of this factorization, please refer to Eq. (6.117) of the following book. • J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, Berlin, Germany: Springer, 2008.

Single-Channel Noise Reduction (SCNR) • The signal model: • SCNR filter: • Error signal: • MSE cost function: • The Wiener filter: where and • Other SCNR methods: Parametric Wiener filter, Tradeoff filter. • A well-known feature: Noise reduction is achieved at the cost of adding speech distortion.

A second-order complex circular random variable (CCRV) has: which implies that and its conjugate are uncorrelated. • In general, speech is not a second-order CCRV: • But noise is a second-order CCRV if stationary, and not otherwise. • Examine • This is similar to the signal model of a two-element microphone array. So there is a chance to reduce noise without adding any speech distortion. Correlated but not completely coherent Uncorrelated or coherent New Idea for SCNR

Widely Linear Wiener Filter • New filter for SCNR: • Error signal: • Widely linear MSE: • Then the widely linear Wiener filter or MVDR type of filters can be developed.

Computational Platform/Technology Selection • Three platforms under consideration: • ASIC • DSP • FPGA • Trade-off among performance, power consumption, size, and costs • Four competing factors: • The count of transistors employed • The number of clock cycles required • The time taken to develop an application • Nonrecurring engineering (NRE) costs ASIC • Low numbers of transistors and clock cycles • Long development time and high NRE costs • Effective in performance, power, and size, but not in cost DSP • Low development and NRE costs • Low power consumption • More efforts to convert the design to ASICs FPGA • Not suited to processing sequential conditional data flow, but efficient in concurrent applications • Support faster I/O than DSPs • One step closer to ASIC than DSP • High development cost due to performance optimization

MIC CAPSULE HOT HOT HOT HOT HOT HOT HOT HOT Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit Mic. Powering Circuit 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 COLD COLD COLD COLD COLD COLD COLD COLD 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 GND GND GND GND GND GND GND GND Mic. Preamps G . . . G . . . . . . G G G G G G . . . . . . . . . Jumpers (for Gain Control) 7 1 2 8 6 3 4 5 System Block Diagram XLR Male XLR Female DB25 Male DB25 Female Power Mgmt IC Digital Output Interface (USB 2.0) JTAG (Male) Power Jack Altera FPGA 8-ch 24-bit 48kHz ADC Flash Analog Input SDRAM SDRAM FPGA Board

FPGA Board Block Diagram USB 2.0 (High Speed) User LED/IOs OPA1632 (1) ADS1278 Altera Cyclone III EP3C55F484C8 FPGA 16 MB SDRAM (×32) OPA1632 (2) 3.3 V 16 MB SDRAM (×32) EPCS16 16 MB Flash (×16) OPA1632 (8) 50 MHz XTAL 24.576 MHz XTAL

Prototype FPGA Board: the Top View Phantom Power Feeding USB 2.0 Jack User LEDs User I/Os FT2232H GND OPA1632 REF1004 ADS1278 EPCS16S JTAG 12 MHz Crystal TPS65053 Flash DB25 DC Power Jack Power LED Mic. Pream Gain Jumpers Analog Power DC 9V Analog Power DC 5V Cyclone III FPGA SDRAMs 174.8 mm × 101 mm

Prototype FPGA Board: the Bottom View OPA1632 24.576 MHz Clock Oscillator (OSC1) 50 MHz Clock Oscillator (OSC2)

I/O ROM RAM CPU (NIOS II) UART DSP FPGA System Development Flow Adopted in the Project System on Programmable Chip (SoPC) + C/C++ Programming: • Use SoPC Builder to construct a soft-core NIOS II processor embedded on the Altera FPGA • Develop software/DSP systems in C/C++ on the NIOS II processor • Drawbacks: • Poor efficiency and low performance: • Efficiency can be improved by identifying those time-consuming functions (e.g., FFT and IFFT) and accelerating them with the tool of C2H (C-to-Hardware) • Advantages: • Short development cycle/time • Low cost • High reliability • Reusability of intellectual property

20 mm 20 mm Pin 18 7 Subarrays 5 mm 5 mm 5 mm 1 2 3 4 5 6 7 5 mm XG-MPC-MEMS b b b b b b b Pin 1 Analog Device ADMP402 MEMS Microphones: 2.5 mm × 3.35 mm a a a a a a a c c c c c c c d d d d d d d MEMS Microphone Array 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 Samsung 18-pin Connector 2 1

1 2 3 4 5 6 7 MEMS Microphone Array Box 35 mm Wevoice MEMS Microphone Array 12.5 mm 155 mm Pin 1 Pin 18 Samsung 18-pin Connector

FPGA Program Flowchart From ADC To USB From ADC To USB FPGA overlap add USB trans. overlap add USB trans. data in & preprocessing data in & preprocessing Nios II Soft Core MCNR+SCNR MCNR+SCNR . . . . . . FFT/IFFT Processor 4-ch FFT 4-ch FFT 1-ch IFFT 1-ch IFFT t t+4 t+8 1 time frame time (ms) Processing delay < 8 ms

IA System Windows Host Software • Programmed with Microsoft Visual C++ • Direct Sound is used to play back audio (speech). Splash window of the program

IA System Windows Host GUI: Multitrack View

IA System Windows Host GUI: Single-Track View

IA System Windows Host GUI: Playing Back

MEMS Microphone Array PC USB 2.0 Cable DB25 Connectors Audio Cable Suited Subject FPGA Board Power Supply: Linear DC 12-20V/1A The Portable, Real-Time Demo System

What is and Why do we want Immersive Communication? • Telecommunication helps people collaborate and share information by cutting across the following 3 separations/constraints: • Long distance • Real time • Physical boundaries • Modern telecommunication technologies are successful so far in transcending the first two constraints: i.e., the long-distance and real-time constraints. • Immersive communication offers an feeling of being together and sharing a common environment during collaboration. • Immersive communication targets at breaking the physical boundaries, which is the “last mile” problem in communication.

Noise and Echo Control for Immersive Voice Communication in Spacesuits

Noise and Echo Control for Immersive Voice Communication in Spacesuits

Presentation Transcript

Digital Voice Communication

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Noise and Noise Control

BEST PRACTICES FOR NOISE CONTROL

Noise Suppressor + Echo Canceller

Immersive Virtual Humans for Educating Medical and Pharmacy Communication Skills

Noise in Communication Systems

Acoustics and Noise Control

Voice Recognition for Wheelchair Control

NOISE MEASUREMENT and CONTROL

Noise and Vibration Control

Immersive Virtual Characters for Educating Medical Communication Skills

Construction Noise Control « Noise Control Ordinance » (Chapter 400)

Active Noise Control

Noise Control In Highway Construction

HVAC Noise Control

Immersive Virtual Characters for Educating Medical Communication Skills

ACTIVE NOISE CONTROL

ProjectEDGE: For Project Control and Communication

Unified Voice Communication

Unified Voice communication

Noise Control Products