Position Calibration of Acoustic Sensors and Actuators

PositionCalibration of Acoustic Sensors and Actuators • on Distributed General Purpose ComputingPlatforms • Vikas Chandrakant Raykar | University of Maryland, CollegePark

Motivation • Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Speakers Microphones Current Thesis Distributed Capture DistributedRendering Cameras Number Crunching Displays Other Applications

What can you do with multiple microphones… • Speaker localization and tracking. • Beamforming or Spatial filtering. X

Some Applications… Speech Recognition Hands free voice communication Novel Interactive audio Visual Interfaces Multichannel speech Enhancement Smart Conference Rooms Audio/Image Based Rendering Audio/Video Surveillance Speaker Localization and tracking MultiChannel echo Cancellation Source separation and Dereverberation Meeting Recording

More Motivation… • Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. • Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. On the other hand • Computing devices such as laptops, PDAs, tablets, cellular phones,and camcorders have become pervasive. • Audio/video sensors on different laptops can be used to form a distributed network of sensors.

Common TIME and SPACE • Put all the distributed audio/visual input/output capabilities of all the laptops into a common TIME and SPACE. • This thesis deals with common SPACE i.e estimate the 3D positions of the sensors and actuators. Why common SPACE • Most array processing algorithms require that precise positions of microphones be known. • Painful, tedious and imprecise to do a manual measurement.

This thesis is about.. Z Y X

If we know the positions of speakers…. Y If distances are not exact If we have more speakers Solve in the least square sense ? X

If positions of speakers unknown… • Consider M Microphones and S speakers. • What can we measure? Distance between each speaker and all microphones. Or Time Of Flight (TOF) MxS TOF matrix Assume TOF corrupted by Gaussian noise. Can derive the ML estimate. Calibration signal

Nonlinear Least Squares.. More formally can derive the ML estimate using a Gaussian Noise model Find the coordinates which minimizes this

If noise is Gaussian and independent ML is same as Least squares Maximum Likelihood (ML) Estimate.. we can define a noise model and derive the ML estimate i.e. maximize the likelihood ratio Gaussian noise

Reference Coordinate system | Multiple Global minima Reference Coordinate System Positive Y axis Similarly in 3D 1.Fix origin (0,0,0) 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Origin X axis Which to choose? Later…

On a synchronized platform all is well..

However On a Distributed system..

Multimedia/multistream applications Operating system I/O bus Audio/video I/O devices The journey of an audio sample.. Network This laptop wants to play a calibration signal on the other laptop. Play comand in software. When will the sound be actually played out from The loudspeaker.

On a Distributed system.. Time Origin Signal Emitted by source j t Playback Started Signal Received by microphone i Capture Started t

MS TOF Measurements Joint Estimation.. Microphone and speaker Coordinates 3(M+S)-6 Microphone Capture Start Times M -1 Assume tm_1=0 Totally 4M+4S-7 parameters to estimates MS observations Can reduce the number of parameters Speaker Emission Start Times S

Use Time Difference of Arrival (TDOA).. Formulation same as above but less number of parameters.

Assuming M=S=K Minimum K required..

Nonlinear least squares.. Levenberg Marquadrat method Function of a large number of parameters Unless we have a good initial guess may not converge to the minima. Approximate initial guess required.

Closed form Solution.. Say if we are given all pairwise distances between N points can we get the coordinates.

Classical Metric Multi Dimensional Scaling dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition Same as Principal component Analysis But we can measure Only the pairwise distance matrix

How to get dot product from the pairwise distance matrix… k i j

Centroid as the origin… Later shift it to our orignal reference Slightly perturb each location of GPC into two to get the initial guess for the microphone and speaker coordinates

Example of MDS…

Instead of pairwise distances we can use pairwise “dissimilarities”. When the distances are Euclidean MDS is equivalent to PCA. Eg. Face recognition, wine tasting Can get the significant cognitive dimensions. MDS is more general..

Can we use MDS..Two problems 1. We do not have the complete pairwise distances 2. Measured distances Include the effect of lack of synchronization UNKNOWN UNKNOWN

Clustering approximation…

i j j i j j Clustering approximation… i i

Finally the complete algorithm… Approximation TOF matrix Clustering Approx ts Approx Distance matrix between GPCs Dot product matrix Approx tm Dimension and coordinate system MDS to get approx GPC locations TDOA based Nonlinear minimization perturb Microphone and speaker locations tm Approx. microphone and speaker locations

Sample result in 2D…

Algorithm Performance… • The performance of our algorithm depends on • Noise Variance in the estimated distances. • Number of microphones and speakers. • Microphone and speaker geometry • One way to study the dependence is to do a lot of monte carlo simulations. • Else can derive the covariance matrix and bias of the estimator. • The ML estimate is implicitly defined as the minimum of a certain error function. • Cannot get an exact analytical expression for the mean and variance. • Or given a noise model can derive bounds on how worst can our algortihm perform. • The Cramer Rao bound.

Estimator Variance… • Can use implicit function theorem and Taylors series expansion to get approximate expressions for bias and variance. • J A Fessler. Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): Applications to tomography. IEEE Tr. Im. Proc., 5(3):493-506, 1996. • Amit Roy Chowdhury and Rama Chellappa, "Statistical Bias and the Accuracy of 3D Reconstruction from Video", Submitted to International Journal of Computer Vision • Using first order taylors series expansion Rank Deficit..remove the Known parameters Jacobian

Gives the lower bound on the variance of any unbiased estimator. • Does not depends on the estimator. Just the data and the noise model. • Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Rank Deficit..remove the Known parameters Jacobian

Different Estimators..

Number of sensors matter…

Geometry also matters…

Calibration Signal…

Time Delay Estimation… • Compute the cross-correlation between the signals received at the two microphones. • The location of the peak in the cross correlation gives an estimate of the delay. • Task complicated due to two reasons 1.Background noise. 2.Channel multi-path due to room reverberations. • Use Generalized Cross Correlation(GCC). • W(w) is the weighting function. • PHAT(Phase Transform) Weighting

Time Delay Estimation…

X Room Height = 2.03 m Speaker 3 Mic 3 Mic 4 Room Length = 4.22 m Speaker 2 Speaker 4 Mic 2 Mic 1 Speaker 1 Z Room Width = 2.55 m Synchronized setup | bias 0.08 cm sigma 3.8 cm

Master GPC 1 GPC 2 GPC M Initialization phase Scan the network and find the number of GPC’s and the UPnP services available Play ML Sequence • GPC 1 (Speaker) GPC 2 (Mic) • Calibration signal parameters Play Calibration Signal TOA Computation TOA matrix TOA Position estimation Distributed Setup…

Experimental results using real data

Related Previous work… • J. M. Sachar, H. F. Silverman, and W. R. Patterson III. Position calibration of • large-aperture microphone arrays. ICASSP 2002 • Y. Rockah and P. M. Schultheiss. Array shape calibration using sources in unknown • locations Part II: Near-field sources and estimator implementation. IEEE Trans. Acoust., • Speech, Signal Processing, ASSP-35(6):724-735, June 1987. • J. Weiss and B. Friedlander. Array shape calibration using sources in unknow locations a maximum-likelihood approach. IEEE Trans. Acoust., Speech, Signal Processing , 37(12):1958-1966, December 1989. • R. Moses, D. Krishnamurthy, and R. Patterson. A self-localization method for wireless • sensor networks. Eurasip Journal on Applied Signal Processing Special Issue on Sensor • Networks, 2003(4):348{358, March 2003. • index.htm

Our Contributions… • Novel setup for array processing. • Position calibration in a distributed scenario. • Closed form solution for the non-linear minimization routine. • Expression for the mean and variance of the esimators. • Study the effect of sensor geometry.

Position Calibration of Acoustic Sensors and Actuators