Stavri Nikolov1, Tim Dixon2, John Lewis1, Nishan Canagarajah1, Dave Bull1, Tom Troscianko2, Jan Noyes21Centre for Communications Research, University of Bristol, UK2Department of Experimental Psychology, University of Bristol, UK How Multi-Modality Displays Affect Decision Making NATO ARW 2006, 21 - 25 October 2006, Velingrad, Bulgaria
Overview • Multi-Sensor Image Fusion • Multi-Modality Fused Image/Video Displays • Target Detection in Fused Images with Short Display Times (results) • Scanpath Assessment of Fused Videos • Multi-Modality Image Segmentation • Summary
How Does Image/Video Fusion Affect Decision Making • Experiment 1: Target Detection in Fused Images with Short Display Times; Decision: is the target present or not? • Experiment 2: Target Tracking in Fused Videos (+ secondary task); Decision: where to look to follow the target? • Experiment 3: Image Segmentation (decomposing an image into meaningful regions/object) in Fused Images; Decision: which objects to segment and how?
Multi-Sensor Image Fusion: Definition • the process by which several images coming from different sensors, or some of their features, are combined together to form a fused image • the aim of the fusion process is to create a single image (or visual representation) that will capture most of the important and complementary information in the input images and will resolve better any uncertainties, inconsistencies or ambiguities.
F Multi-Sensor Image Fusion: Example An example Visible and IR images courtesy of Octec Ltd, UK
Multi-Sensor Image Fusion: Applications • Many different applications of image fusion: • remote sensing • surveillance • defence • computer vision • robotics • medical imaging • microscopic imaging • art
Multi-Sensor Image Fusion: Applications • Image fusion is used in: • night vision systems • binocular vision • 3-D scene model building from multiple views • image/photo mosaics • digital cameras and microscopes to extend the effective depth of field by combining multi-focus images • target detection
Multi-Sensor Image Fusion: Different Levels • Image fusion can be performed at different levels of the information representation: • signal level • pixel level • feature / region level • object level • symbolic level
Multi-Modality Image Displays • Adjacent (side-by-side) displays (*) • Window displays • Fade in/out displays • Checkerboard displays (*) • Gaze-contingent multi-modality displays (*) • Hybrid fused displays (*) • Interleaved video displays
Adjacent and Checkerboard Displays Images from the Eden Project Multi-Sensor Data Set
Gaze-Contingent Multi-Modal Displays Demo of a gaze-contingent multi-modal display (GCMMD) using aerial photographs and maps of England (from Multimap.com). “Multi-Modality Gaze-Contingent Displays for Image Fusion", S. G. Nikolov, M. G. Jones, I. D. Gilchrist, D. R. Bull, C. N. Canagarajah, Proceedings of Fusion 2002
Hybrid Fused Image Displays (1.0,0.0) (0.8,0.2) (0.6,0.4) (0.4,0.6) (0.2,0.8) (0.0,1.0) “Hybrid Fused Displays: Between Pixel- and Region-Based Image Fusion", S. G. Nikolov, J. J. Lewis, R. J. O’Callaghan, D. R. Bull and C. N. Canagarajah, Proceedings of Fusion 2004
Fused Image Assessment • The results of image fusion are: • either used for presentation to a human observer for easier and enhanced interpretation • or subjected to further computer analysis or processing, e.g. target detection or tracking, with the aim of improved accuracy and more robust performance • Finding an optimal fused image is a very difficult problem since in most cases this is task and application dependent.
Which Fused Image is Better? Original Visible and IR “UN Camp” images courtesy of TNO Human Factors … it depends what we want to do with it, i.e. the task we have!
Input and Fused Image Metrics (IFIMs) Input Image Metrics (IIMs) Fused Image Metrics (FIMs) Categories of Fused Image Assessment Metrics A B input images FUSION F fused image
Fused Image Assessment Metrics • A number of image quality metrics have been proposed in the past but all require a reference image • In practice an ideal fused is rarely known and is application and task specific • other metrics try to estimate what information is transferred from the input images to the fused image • two such metrics that we used in our study to assess the quality of the fused images are Piella's image quality index (IQI)  and Petrovic's edge-based Q^AB/F metric [00,03] (both of which are IFIMs)
Experiment 1: Target Detection in Fused Images Decision: Is the target present or not?
Target absent Average Target present Clean Contrast Pyramid DT-CWT Low High Experiment 1, Task 1: Objective Human Task Performance • Testing 3 fusion schemes: AVR, CP & DT-CWT, and 3 JPEG2000 compression rates: clean, low (.3bpp) and high (.2bpp). • Using a signal detection paradigm to assess Ps ability to detect presence of the soldier (target) in briefly displayed images.
Task 1: Method • Fixation point ‘+’ shown for 750ms, an image presented for 15ms, followed by an inter-stimulus interval of 15ms, and a mask for 250ms.
by Fusion type and by Compression level Experiment 1, Task 2: Subjective Image Assessment • Show pairs of images, ask Ps to rate both out of 5 (5 = Best quality, 1 = Worst quality). Images paired:
Target Detection in Fused Images: Main Results • The results showed a significant effect for fusion but not compression in JPEG2000 images • Subjective ratings differed for JPEG2000 images, whilst metric results for both JPEG (different study) and JPEG2000 showed similar trends “Characterisation of Image Fusion Quality Metrics for Surveillance Applications over Bandlimited Channels", E. F. Canga, T. D. Dixon, S. G. Nikolov, D. R. Bull, C. N. Canagarajah, J. M. Noyes, T. Troscianko, Proceedings of Fusion 2005
Experiment 2: Target Tracking in Fused Videos Decision: Where to look to follow the target?
Experiment 2 • Applying an eye-tracking paradigm to the fused image assessment process. • Moving beyond still images: assessing participants’ ability to accurately track a figure. • Using footage taken recently at the Eden Project Biome. • Videos of a ‘soldier’ walking through thick foliage filmed in both visible light and IR, and at two natural luminance levels. • All videos registered using our Video Fusion Toolbox (VFT)
Original Videos Used • High Luminance (HL) • Low Luminance (LL) Videos from the Eden Project Multi-Sensor Data Set
High Luminance: Fused Average Fused DWT Fused DT-CWT Fused Videos Used Low Luminance: • Fused Average • Fused DWT • Fused DT-CWT
Tasks + Methods • Participants asked to visually track the solider as accurately as possible throughout video sequence. • Tobii x50 Eye-Tracker used to record eye movements. • Participants also asked to press SPACE at specific points in the two sequences (when soldier walked past features of the scene). • 10 Ps (5m, 5f): mean age = 27.1 (s.d. = 6.76). • Each shown 6 displays: Viz, IR, Viz+IR*, AVE, DWT, DT-CWT. • All Ps shown each condition in 3 separate sessions. • Half shown above order first, half reverse order. Order switched for 2nd and switch back for 3rd sessions. • Eye position and reaction times recorded.
Accuracy Results I • Eye position translated onto target box for each participant. • Calculated an accuracy ratio, hits:total views for each condition. • Also considered Tobii accuracy coding.
Accuracy Results II Videos from the Eden Project Multi-Sensor Data Set
* * ** Results (High Luminance) • Accuracy Scores revealed: • Main effect display modality (p = .001). • No main effect of session (p > .05). • No interaction (p > .05). • Post hoc tests revealed differences between Viz and: AVE, DWT, CWT. • IR and: AVE, DWT • RT Scores revealed: • No significant effects “Scanpath Analysis of Fused Multi-Sensor Images with Luminance Change", T.D. Dixon, S.G. Nikolov, J.J. Lewis, J. Li, E.F. Canga, J.M. Noyes, T. Troscianko, D.R. Bull and C.N. Canagarajah, Proceedings of Fusion 2006
* ** Results (Low Luminance) • Accuracy Scores revealed: • Main effect display modality (p < .001). • No main effect of session (p > .05). • No interaction (p > .05). • Post hoc tests revealed differences between Viz and: IR, AVE, DWT, CWT. • RT Scores revealed: • Main effect of fusion: IR significantly closer to ‘ideal’ timing.
Target Tracking in Fused Videos: Conclusions I • The current experimental results reveal two methods for differentiating between fusion schemes: the use of scanpath accuracy and RTs. • Fused videos with higher (perceived) quality do not necessarily lead to better tracking performance • The AVE and DWT fusion methods were found to perform best in the 2.1_i tracking task. From a subjective point, the DWT appeared to create a sequence that was much noisier and with more artefacts than the CWT method.
Target Tracking in Fused Videos: Conclusions II • All of the fusion methods performed significantly better than the inputs, highlighting the advantages of using a fused sequence even when luminance levels are high. • Results suggest that when luminance is low, any method of attaining additional information regarding the target location will significantly improve upon a visible light camera alone.
Experiment 3: Multi-Modal Image Segmentation Decision: Which objects to segment and how?
Multi-Modal Image Segmentation • Multi-modal sensors Multi-sensor systems • Many applications need good segmentation • How best to segment a set of multi-modal images? • To study how fusion affects segmentation • Previous evaluation methods • Subjective • based on ground truth • Need for objective measure of quality of segmentation techniques } sets of multi-modal images
Joint Vs. Uni-Modal Segmentation Two approaches investigated: • Uni-modal segmentation S1 = σ(I1),…, SN = σ(IN) • Each image segmented separately • Different segmentations for each image in the set • Joint segmentation Sjoint = σ(I1 …IN) • All images in the set contribute a single segmentation • Segmentation accounts for all features from all input images
Uni-Modal and Joint Image Segmentation Original IR image in red Original Visible Image in green Joint Segmentation Unimodal Segmetation Unimodal Segmentation Union of Unimodal Segmentations
Multi-Sensor Image Segmentation Data Set • To enable objective comparison of different segmentation techniques • Need some method of finding a “ground truth” of natural images • The human visual system is good at segmenting images • The Berkeley Segmentation Database • 1000 natural images • 12000 human segmentations [Martin et al., A Database of Human Segmented natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics, ICCV, 2001]
Multi-Sensor Image Segmentation Data Set • 11 Sets of multi-modal images • 14 IR and 11 grey scale images • 33 fused images from 3 pixel-based fusion algorithms • Contrast pyramids • Discrete wavelets transform • Dual tree complex wavelet transform • All images have been segmented by the techniques described using the same “good” parameters across the whole data set
Image Data Set: Examples Images from the Multi-Sensor Image Segmentation Data Set
Experimental Setup • 63 subjects • The instructions were to Divide each image into pieces, most important pieces first, where each piece represents a distinguished thing in the image. The number of things in each image is completely up to you. Something between 2 and 20 is usually reasonable. Take care and try and be as accurate as possible. • 5 images segmented each • Images pseudo-randomly distributed so that: • Each subject sees only one image from each set • They see at least one IR, one visible and one fused image • An image is not distributed a second time unless all images have been distributed once; etc.
The Segmentation Tool The Berkeley Segmentation Tool (SegTool)
The Human Segmentations • 315 human segmentation produced • ~20 rejected as obviously wrong • 5-6 segmentations for each image • 1 expert segmentation for each image The human segmentations are available to download from www.ImageFusion.org
Examples of Human Segmentations User 5 User 15 User 35 User 39 User 54 User 61 Human Segmentations of “UN Camp” CWT Fused Image
Segmentation Error Measure I We adopt the approach used with the Berkley Segmentation Dataset • Precision, P, fraction of detections that are true positives rather than false positives • Recall, R, fraction of true positives that are detected rather than missed • F-measure is a weighted harmonic mean F = PR/(αR+(1- α)P) • α = 0.5 used
Segmentation Error Measure II • Correspondences computed by • Comparing the segmentation to each human segmentation of that image • Correspondence computed as a minimum cost bipartite assignment problem • Scores averaged to give a single P, R and F value for each image • Tolerates localization errors • Finds explicit correspondences only
Examples of Automatic and Human Segmentations I Images from the Multi-Sensor Image Segmentation Data Set
Examples of Automatic and Human Segmentations II Images from the Multi-Sensor Image Segmentation Data Set