Abstract

Abstract We present an overview of our work in creating a practical prototype for natural 3D stereoscopic eye contact in teleconferencing. The two main pillars of our system are to ‘immerse’ the participants and to create as much as possible a natural feel to the communication. This is mainly realized by synthesizing virtual camera viewpoints to restore the eye contact, and by creating a sense of depth perception through feeding correctly generated stereo images directly to the users’ eyes. • System Architecture • Our stereoscopic system consists out of four • main processing modules [1,2]: • Preprocessing: Bayer demosaicing, radial lens distortion correction and segmentation. all on the GPU to improve the data-locality and arithmetic intensity [3]. • Vergence Control and Eye Tracking:Vergence control dynamically adjuststhe stereo base (disparities) and the eyesconvergence to match the accommodation. Accommodation is determined by distance from the user to the (virtual) screen. Concurrent eye tracking on the CPU allows to correctly position the virtual camera. • Stereoscopic Plane Sweep: Two parallel sweeps to synthesize the left and right virtual viewpoint, in 4 GPU phases: • View interpolation: Interpolate the eye-gaze corrected view (and joint depth map), using a plane sweeping approach. • Depth Refinement:Erroneous patches and speckle noise due to illumination changes, partially occluded areas and natural homogeneous texturing, are restored using a photometric outlier detection algorithm on the initial depth map. • Recoloring: The refined depth map is used to recolor the interpolated image using the confident camera recoloring strategy [2], leading to a sharp detailed (high-frequency) image. When confronted with (low-frequency) blurry images, the eye naturally tries to accommodate for this, but will be unable to do so. • Movement Analysis: Dynamically limit the effectively scanned depth range through a movement analysis on the normalized depth map histogram. This avoids heavy constraints on the user’s movements, while increasing the plane sweep quality (due to a reduced mismatch probability) and reducing the complexity. Furthermore, feedback can be provided to the vergence control to helprestore the link between accommodation and convergence distance. • Stereo Feed and Network: The stereo feed packs the synthesized images together, allows for optional red/cyan anaglyph filtering and compression for efficient network transmission. Cross-remote computations are performed, to enforce data processing as close as possible to the input cameras. The load over the network is hereby minimized, and real-time transmission is provided. Stereoscopic Communication When the eyes fixate (converge) on a given person, they focus (accommodate) to create a highly detailed image, resulting in a strong link between fixating and focusing. When depth perception is created artificially, the link between convergence and accommodation is often destroyed, causing eye strain and visual discomfort. Vergence: Simultaneously and symmetrically moving both eyes in opposite directions to converge to a specific point, consequently having zero (angular) disparity. Horopter: Locus that exhibits zero disparity. Points located in front (resp. beyond) the horopter exhibit crossed (resp. uncrossed) angular disparity, causing crossed (resp. uncrossed) diplopia (double vision) when the disparities are too excessive. Panum’s fusional area: Locus where the eyes are still able to fuse the two input images. Accomodation: Light casted from a fixated point into the iris is automatically focused by the eye lens to fall on the fovea, the most photoreceptive part of the retina. Results Our stereo teleconferencing system was tested with a head mounted FLCOS-type display, which uses two LCD-alike screens with lenses to create a 70 inch virtual screen with an accommodation distance of 13 feet. Because the physical screens and lenses are fixed and strapped to the head, the accommodation is constant and simplifies the vergence control to a static setting (detected in the initialization). The 2 x 800 x 600 stereoscopic feed can be produced from 6 input cameras in real-time at 26fps on an Intel Xeon 2.8GHz equipped with an NVIDIA GeForce 8800GTX, but is limited by 15Hz support in the cameras and Firewire controller hardware. Results were generated under moderate variable illumination conditions, with a fixed set of finetuned practical system parameters. Towards Many-to-Many and Multi-Party Our long-term goal is a full immersive augmented environment where multiple participants, spread over different parties, can communicate and cooperate as if they were in the same room. The current implementation speed (42 fps mono / 26 fps stereo) allows for further image quality optimization by advancing the algorithm and computational complexity. The advantages of N-camera and confident camera recoloring can be combined in a camera blending field. The responsiveness of the movement analysis will be improved, and multiple objects will be intelligently distinguished (with collision detection) by combing silhouette information [4], enabling eye-gaze corrected many-to-many and multi-party video chat. IMMERSIVE TELECONFERENCING WITH NATURAL 3D STEREOSCOPIC EYE CONTACT USING GPU COMPUTING Maarten Dumont 1, Sammy Rogmans 1,2, Gauthier Lafruit 2, and Philippe Bekaert 1 1 Hasselt University – tUL – IBBT, Expertise centre for Digital Media, Wetenschapspark 2, 3590 Diepenbeek, Belgium. firstname.lastname@uhasselt.be 2 Multimedia Group, IMEC, Kapeldreef 75, 3001 Leuven, Belgium. firstname.lastname@imec.be Introduction Problem: Traditional webcam video chat offers no eye contact, no depth perception and no extensive context information. Our previous solution: Peer-to-peer eye gaze corrected video chat using N input images I1, …, IN fetched from N cameras C1, …, CN that are closely aligned along the screen. A virtual camera viewpoint is interpolated in real-time on the GPU to restore eye contact. [1,2] Our current solution: 3D stereoscopic rendering to provide natural depth perception. Problem: How to maximize visual comfort and avoid eye strain? [1] Maarten Dumont, Steven Maesen, Sammy Rogmans, and Philippe Bekaert. A Prototype for Practical Eye-Gaze Corrected Video Chat on Graphics Hardware. In SIGMAP, pp. 236-243, Porto, Portugal, July 2008. [2] Maarten Dumont, Sammy Rogmans, Steven Maesen, and Philippe Bekaert. Optimized Two-Party Video chat with Restored Eye Contact using Graphics Hardware. In Springer-Verlag CCIS 48, 2009. [3] Sammy Rogmans, Maarten Dumont, Tom Cuypers, Gauthier Lafruit, and Philippe Bekaert. A high-level kernel transformation rule set for efficient caching on graphics hardware. In SIGMAP, Milan, Italy, July 2009. [4] Sammy Rogmans, Maarten Dumont, Tom Cuypers, Gauthier Lafruit, and Philippe Bekaert. Complexity Reduction of Real-Time Depth Scanning on Graphics Hardware. In VISAPP, pp. 547-550, Lisbon, Portugal, February 2009.

Abstract

Abstract

Presentation Transcript

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

ABSTRACT

Abstract

Abstract

Abstract

ABSTRACT THE ABSTRACT / TUTORIALOUTLETDOTCOM

Abstract