EYE GAZE FUSION Application of sensor fusion to the estimation of human gaze location estimation

EYE GAZE FUSION Application of sensor fusion to the estimation of human gaze location estimation

Back camera Display & eye tracker Kinect PLC Encoder PC Motor The system - electrical wheelchair

It’s a vision system able to tell where the user is looking at • Using an infrared camera, it is possible to map the gaze of the user on the screen by detecting the location of the pupil with respect to a known pattern of light The device –eye tracker

The Pupil-Corneal Reflection eye trackers The plane ABCD is supposed to lie on the cornea D. H. Yoo, J. H. Kim, B. R. Lee, M. J. Chung,“Non-contact eye gaze tracking system by mapping of corneal reflections”. In: Proc. Of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 94-99. IEEE (2002) The device – eye tracker The system

Determine the cross ratio of: A,B,M1,M2 Determine the cross ratio of: A’,B’,M’1,M’2 Cross ratio is invariant in projective space hence: To obtain the y coordinate of P’ the calculation are the same. The device – eye tracker

IR illuminators Glints Monitor IR Camera Sample of eye tracker software interface The device – eye tracker The system

Applications • Eye tracking technologies are currently being employed in various contexts including: • Evaluation of attention • Marketing • Human-Machine Interaction The device – the eye tracker

EyeAssist • In the experiments we have conducted we adopted the eye-tracking system developed by Xtensa • EyeAssist is a low-cost platform, which is mostly software-based and allows easy integration and customization to fit different applications and screen sizes

We target the use of eye trackers for assistive robotics in order to allow severely impaired users moving autonomously • To do so, their gaze is analysed and the observed point is mapped into the real world coordinates • OR the eye tracker can be used as a joystick • Unfortunately the eye tracking repeatability is very poor (in a 1600 x 1200 LCD it can reach 200 pixels, i.e. close to its 20%) Target application

Kinect has a series of motion sensing input devices. It enables users to control and interact with their console/computer without the need for a game controller. Moreover it provides full-body 3D motion capture, facial recognition and voice recognition capabilities. Kinect

We want to determine the 3D point where the user wants to arrive together with its covariance starting from the point of fixation on the 2D image. The user select a point on the image, that is the point he wants to reach. This point with it’s uncertainty is converted from the image reference system to the 3D world reference system. The wheelchair will move towards the selected point. The kinect acquire a new image nearest to the target point, so the user can select the wanted destination with higher accuracy. The idea is to fuse the previous data with the new one. In this way we keep improving the estimation of the arrival point while approaching it. Thank to the fusion the uncertainty will decrease. The main steps are explained in the following slides. Logical step

Selection of the point on the image • The kinect give a 2D image of the ambient. The user select the point where he wants to go by looking at a point on the image. • Given the performance of the instrument this point will have an uncertainty. • So the first information is a point on the 2D image with an uncertainty. Point selected by the user with the eyes with its uncertainty Logical step

2. Find the 3D point corresponding to the point of fixation From the data coming from the kinect we can find the point in 3D corresponding to the 2D point of fixation. In particular we consider all the point inside the uncertainty and find the corresponding 3D data. Logical step

3. Find through the 3D points which ones are on the floor This is obtained by simply eliminate all those point that has a z coordinate greater than a certain value. Note that we are supposing that the user will look at the point on the floor where he wants to arrive, so if he wants to approach the door he should not look at the door but at the base of the door. Logical step

4. Uncertainty representation through the covariance ellipse (Cheesman) Once obtained the point of interested we need to calculate the mean and the covariance. Logical step

5. Sensor Fusion (Bayes) The last step is to fuse the data coming from the different images in an unique information, that is the weighted mean and the associated weighted covariance. We calculate the weight according to the Bayes theorem. New acquisition Fused covariance Previous acquisition Logical step

Open the file “MainLesson.m”. In the first line of code select the folder containing all the data. folderName = uigetdir; addpath(folderName); dataFolder = dir(folderName); Hereafter the code is inside a for loop that ends when the counter overcomes the avaible data. for k = 1:(length(dataFolder)-2)/3

Data0 = importdata(dataFolder(3+3*(k-1)).name); Data comingfrom the kinect < 217088 x 9 >: X Y 3D coordinates Z R G colors B I 2D image coordinates J depth The code

Since we will use information coming from consecutive images we have to align the cloud of points, e.g. we can move the first on the second one. We can determine rotation and translation using ICP. Iterative Closest Point (ICP) is a matching algorithm used in order to minimize the differences between two clouds of points. In the algorithm one point cloud is kept fixed, while the other is translated and rotated in order to obtain the best match possible with the first one. The algorithm tries to minimize the distance from the second to the first point cloud.

The roto-translation matrix between the images have already been calculated and are stored in the “RotoTranlationMatrix” folder. So we simply need to read and apply those values in the correct way. DataRT(:,:,k) = importdata(RTFolder(2+k).name); M = inv(DataRT(:,:,k)); R(:,:,k) = M(1:3,1:3); Tr(:,k) = M(1:3,4); Told(:,:,k) = [x(:,k) y(:,k) z(:,k)]; TNew(:,:,k) = Align(TNew(:,:,1),Told(:,:,k)',R(:,:,k),Tr(:,k),1);

Problem: the 3D coordinates are taken with respect to the kinect mounted on the robot, hence they are not expressed in the ground reference system. H(:,:,k) = GroundCalibrationKinect2(data); Use this function to find the roto-translation matrix between the reference system attached to the kinect and the gorund reference system function. The function use the RANSAC (Random Sample Consensus) algorithm to fit a model to experimental data. With respect to other method, such as least square, RANSAC is capable to tollerate gross errors. This makes it particular suitable for the image analysis. To know more: Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (June 1981), 381-395. DOI=10.1145/358669.358692 http://doi.acm.org/10.1145/358669.358692

Apply the roto-translation matrix obtained as sayed to the points.: PointsWrtGround is a <4 x N > matrix (where N is the number of points valid acquired by the kinect) The function plot also the data before and after the transformation. In the following slides there is an example. PointsWrtGround(:,:,k) = KinectToGround(x(:,k),y(:,k),z(:,k),H(:,:,k),1);

Points in the kinect reference system

Points in the ground reference system

Image acquired by the kinect is showed on the pc (or the device on which is mounted the eye tracker). The user will select a point on this image by looking at it. For the moment we select a point with the mouse using the Matlab function ginput. Moreover we have to consider that the data coming from the eye tracker have a certain standard deviation. This strongly depend on the kind of device used. In this case we set ETUncertainty=50 pixels. Eventually we have a point with a covariance ellipse on the image. It is useful to implement a loop in order to verify whether the point is not on the floor or is a point outside the once seen from the kinect. If this is the case you have to discard the point and select another one.

I = imread('K_0_RGB.png','png'); imshow(I); hold on; [iI,jI] = ginput(1); plot(iI,jI,'.b'); [h(:,k),iEll(:,k),jEll(:,k)]=myEllipse(ETUncertainty,ETUncertainty,0,iI,jI); legend('Point of fixation'); hold off;

The next step will be to translate this information in the 3D coordinates: PointsKinect = ImageToKinect(i(:,k),j(:,k),iEll(:,k),jEll(:,k),PointsWrtGround(:,:,k),1); Through all the points found in this way we take into account only the one on the floor. In order to select them we exclude those points that have a z coordinate greater than a certain value eps. eps=0.01; PointsOnFloor(:,PointsKinect(3,:)>eps)=[];

We ends up with a certain number of points. From this point we can calculate the mean and the covariance. Hence we have moved the 2D information coming from the image in an information in the 3D world. Note that since the point is on the ground the coordinates are still 2D. [meanX(:,k),C(:,:,k)] = MeanAndCovariance(PointsOnFloor); TO DO: Write the function “MeanAndCovariance” This function has to take in input the previously calculated pointsOnFloor and to return as output the mean and the covariance of those points.

The following function display the mean and the covariance previously obtained on the 3D data collected by the kinect. plotMeanAndCovarianceOnGround(meanX(:,k),C(:,:,k),PointsWrtGround(:,:,k),1,'g'); To make this function work you have to write a function that translate the covariance into the ellipse parameters. TO DO: Write the function “Covariance2EllipseParameters” This function takes as input the covariance matrix and a probabity value and returns as output the ellipse parameters: semiminor axis, semimajor axis and angle.

It has been demostrated by Smith and Cheeseman(*) that the values of confidence k is not the same for the unidimensional and bidimensional case. You should remember the relation in the unidimensional case for k=1,2,3: According to (*) the probability of a point to be located inside the covariance ellipse is: (*) Randall C. Smith and Peter Cheeseman. 1986. On the representation and estimation of spatial uncertainly. Int. J. Rob. Res. 5, 4 (December 1986), 56-68. DOI=10.1177/027836498600500404 http://dx.doi.org/10.1177/027836498600500404

Covariance in 3D

Finally fuse the data obtained so far together. That are the weighted mean and covariance calculated from all the previous image and the new mean and covariance. The weight is assigned according to the variance using a Bayes filter. [xF(:,k),CF(:,:,k)] = Bayes(xF(:,k-1),CF(:,:,k-1),meanX(:,k),C(:,:,k)); TO DO: Write the function “Bayes” This function has to take in input the previous mean and covariance and the new mean and covariance,. and to return as output the weighted mean and the covariance. The last step is simply to plot the results obtained.

Summing up • What you have to do is to write the following functions: • MeanAndCovariance • Covariance2EllipseParameters • Bayes • In the code you can find a brief explanation of what you have to do preceded by dots.

EYE GAZE FUSION Application of sensor fusion to the estimation of human gaze location estimation

EYE GAZE FUSION Application of sensor fusion to the estimation of human gaze location estimation

Presentation Transcript

Sensor Fusion Systems

Point-of-Gaze Estimation: Theory and Applications

M.A.Sc. Thesis Presentation Automated Reading Assistance System Using Point-of-Gaze Estimation

Eye Gaze

Joint Eye Tracking and Head Pose Estimation for Gaze Estimation

Eye Gaze

Tobii Gaze

The Gaze of God Part 2

Indexing and eye gaze

SENSOR FUSION LABORATORY

Gaze

The Female Gaze...

Location Estimation in Sensor Networks

Attack-Resistant Location Estimation in Sensor Networks

The Gaze

The Male Gaze

EYE GAZE

Effect of human gaze on behavior of Magpies Pica pica

Model-Data Fusion Approaches for Exposure Estimation

Gaze palsy

Sensor Fusion market