ISR – Institute of Systems and Robotics University of Coimbra - Portugal

ISR – Institute of Systems and Robotics University of Coimbra - Portugal WP5: Behavior Learning And Recognition Structure For Multi-modal Fusion Part I

RelationshipoftheWP3,4and5 WP3 (Sensor modeling and multi-sensor fusion techniques ) Task 3.3 Bayesian network structures for multi-modal fusion WP4 (Localization and tracking techniques as applied to humans ) Task 4.3 Online adaptation and learning • Levels of the fusion • (pixel, feature or decision level) , • Bayesian structures for the implementation of the scenarios of WP2 • Trackers results, • Events detected , • ids on re-identification situations WP5 Behavior learning and recognition

Proposal Multi-layer Multi-modal Homography-based Occupancy Grid Using data coming from stationary sensors (Structure): • Image Data • Range Data • Sound Source Data

Inertial Compensated HomographyProjecting a world point on a reference plane in two phases First step: Projecting real world point on the virtual image plane Real camera Virtual camera Second step: Projecting form virtual image plane on a common plane Gravity Y Z X [Luiz2007]

Inertial Compensated Homography Camera calibration matrix Infinite homography Real camera Rotation between virtual and real camera (given by IMU) Virtual camera Homography between two planes Gravity Y Z X [Luiz2007]

Image Registration Gravity Y Z X [Luiz2007]

Extending A Virtual Plane To More Plane to image homography: Vanishing points for X,Y and Z directions Vanishing line of reference plane normalized Scale factor Vanishing point of reference direction Scale value encapsulating  and z Z Y X [Khan2007]

Relationship Between Different Planes In The Structure Homography between views i and j, induced by a plane i parallel to ref having the homography of reference plane: Scale value Homography of reference plane Vanishing point of reference direction Z Y X [Khan2007]

Image & Laser Geometric Registration Gravity Y Z X [Luiz2007-Hadi2009]

Registering LRF Data In a Multi-Camera Scenario Image planes Projection of points observed by LRF Camera projection matrix A set of cameras and laser range finder Image Transformation matrix between camera and LRF obtained by calibration Result + Range data Reprojection of LRF data on the image (blue points) Projection of points observed by LRF on the image plane [Hadi2009]

Image & Laser & Sound Geometric Registration Gravity PA() Y Z X [JFC2008, 2009]

Bayesian Binaural System for 3D Localisation Distance  + Elevation  + Azimuth θ Azimuth θ only • Binauralsensing • For sources within 2 meters range, binaural cues alone (interaural time and level differences – ITD , quasi frequency-independent, and ILDs L(fck)) can be used to fully localise the source in 3D space (i.e. volume confined in azimuth q, elevation f and distance r). • If the source is more than 2 meters away the source can only be localised to a volume (cone of confusion) in azimuth. Binaural cue information Z 1m 2m

Bayesian Binaural System for 3D Localisation Subset of [JFC2008, 2009]

Bayesian Binaural System for 3D Localisation Binary variable denoting “Cell C occupied by sound-source” Distance Elevation Azimuth Direct Auditory Sensor Model: (DASM) (Bayesian learning through HRTF calibration using ITDs  and ILDs L) Auditory Saliency Map Bayes Rule Inverse Auditory Sensor Model: (IASM) Solution: cluster local saliency maxima points (i.e. cells with maximum probability of occupancy, 1 per sound-source) (front -to-back confusion effect avoided by considering only frontal hemisphere estimates)

Bayesian Binaural System for Localisation in Azimuth Planes of Arrival Direct Auditory Sensor Model: (DASM) (Bayesian learning through HRTF calibration of interaural time differences – ITDs )  -90º 90º Auditory Saliency Map Bayes Rule Solution: cluster local saliency maxima planes of arrival (PA) per sound-source Inverse Auditory Sensor Model: (IASM) PA() (front -to-back confusion effect avoided by considering only frontal hemisphere estimates)

Demos on Bayesian Binaural System (Arrival Direction of Sound Source) Two Talking persons A walking person

A Sample View Of .

Image & Laser & Sound Occupancy Grid Z Y X Fusion Z Y Image, Range and Sound Occupancy Grid X

Bibliography

Bibliography • Franco, J. & Boyer, E. Fusion of Multi-View Silhouette Cues Using a Space Occupancy Grid Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), 2005 • Christophe Braillon, Kane Usher, C. P. J. L. C. & Laugier, C. Fusion of stereo and optical flow data using occupancy grids Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2006. • Saad M. Khan, P. Y. & Shah, M. A Homographic Framework for the Fusion of Multi-view Silhouettes Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007 • R. Eshel, Y. M. Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008 • Conference (Arsic2008) D. Arsic, E. Hristov, N. L. Applying multi layer homography for multi camera person tracking Distributed Smart Cameras, 2008. ICDSC 2008. Second ACM/IEEE International Conference on, 2008 • Francois Fleuret, Jerome Berclaz, R. L. & Fua, P. Multi-Camera People Tracking with a Probabilistic Occupancy Map IEEE transactions on Pattern analysis and Machine Intelligence, 2008

Bibliography • Sangho Park, M. M. T. Understanding human interactions with track and body synergies (TBS) captured from multiple views Computer Vision and Image Understanding, 2008 • Yuxin Jin, Linmi Tao, H. D. R. N. & Xu, G. Background modeling from a free-moving camera by Multi-Layer Homography Algorithm Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, 2008 • Luiz G. B. Mirisola, Jorge Dias, A. T. d. A. Trajectory Recovery and 3D Mapping from Rotation-Compensated Imagery for an Airship Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems San Diego, CA, USA, Oct 29 - Nov 2, 2007, 2007 • Mirisola, L. G. B. & Dias, J. Tracking from a Moving Camera with Attitude Estimates ICR08, 2008 • Batista, J. P. Tracking Pedestrians Under Occlusion Using Multiple Cameras Image Analysis and Recognition, Springer Berlin-Heidelberg., 2004, 3212/2004, 552-562 • Joao Filipe Ferreira, Pierre Bessière, K. M. C. P. J. L. C. L. & Dias, J. Bayesian Models for Multimodal Perception of 3D Structure and Motion • C. Chen, C. Tay, K. M. & C. Laugier (INRIA, F. Dynamic environment modeling with gridmap: a multiple-object tracking application 9th International Conference on Control, Automation, Robotics and Vision, 2006. ICARCV '06., 2006

Bibliography • J. F. Ferreira, P. Bessière, K. Mekhnacha, J. Lobo, J. Dias, and C. Laugier, “Bayesian Models for Multimodal Perception of 3D Structure and Motion,” in International Conference on Cognitive Systems (CogSys 2008), pp. 103-108, University of Karlsruhe, Karlsruhe, Germany, April 2008. • C. Pinho, J. F. Ferreira, P. Bessière, and J. Dias, “A Bayesian Binaural System for 3D Sound-Source Localisation,” in International Conference on Cognitive Systems (CogSys 2008), pp 109-114, University of Karlsruhe, Karlsruhe, Germany, April 2008. • Ferreira, J. F., Pinho, C., and Dias, J., “Implementation and Calibration of a Bayesian Binaural System for 3D Localisation”, in 2008 IEEE International Conference on Robotics and Biomimetics (ROBIO 2008), Bangkok, Tailand, 2009. • Hadi Aliakbarpour, Pedro Nunez, Jose Prado, Kamrad Khoshhal and Jorge Dias. An Efficient Algorithm for Extrinsic Calibration between a 3D Laser Range Finder and a Stereo Camera for Surveillance, ICAR2009.

Institute of Systems and Robitcs http://paloma.isr.uc.pt

ISR – Institute of Systems and Robotics University of Coimbra - Portugal