THESIS
2020
xvi, 109 pages : illustrations ; 30 cm
Abstract
This thesis investigates the use of eye gaze in human-computer interaction. One
of the biggest challenges of using gaze as an indicator of visual attention is the
"Midas Touch" problem: it is very difficult to distinguish between spontaneous
eye movements for gathering visual information and intentional eye movements
for selection. To avoid this problem, rather than directly using gaze positions
as input, we infer users' intent from their past gaze trajectories and then provide
appropriate assistance. To be specific, we propose a two-stage hidden-Markov-model-based framework to model the gaze trajectories and to infer the
intended targets of the users. Results on a 2D cursor control task and a hyperlink
inference task show that this model infers users' intended target with high...[
Read more ]
This thesis investigates the use of eye gaze in human-computer interaction. One
of the biggest challenges of using gaze as an indicator of visual attention is the
"Midas Touch" problem: it is very difficult to distinguish between spontaneous
eye movements for gathering visual information and intentional eye movements
for selection. To avoid this problem, rather than directly using gaze positions
as input, we infer users' intent from their past gaze trajectories and then provide
appropriate assistance. To be specific, we propose a two-stage hidden-Markov-model-based framework to model the gaze trajectories and to infer the
intended targets of the users. Results on a 2D cursor control task and a hyperlink
inference task show that this model infers users' intended target with high
accuracy. We then integrate this gaze model into two applications: a hybrid
gaze/electroencephalography (EEG) brain-computer interface (BCI), which integrates
the cues from eye gaze and the cues from EEG to control a robot arm,
and a gaze-based web browser, which dynamically adjusts the amount of time
for which the user needs to fixate on the desired hyperlink in order to select it.
Our algorithm improves the overall performance of both systems in a natural
way without increasing the cognitive load of the user.
Moving forward, in order to support applications where the movements of the
user should be relatively unconstrained, we propose a new deep neural network
for appearance-based gaze estimation. We propose to use dilated-convolutions,
which extract high-level features at high resolution on eye images, and gaze
decomposition, which decomposes the line of sight into the sum of a subject-independent
term and a subject-dependent bias. We achieve state-of-the-art
subject-independent gaze estimation on the MPIIGaze and EYEDIAP datasets.
To further reduce the estimation error, we propose a personal calibration method
that works remarkably well on calibration sets of low complexity, i.e., the number
of gaze targets used for calibration and/or the number of images per gaze
target are small. Our results show that the proposed calibration outperforms
other alternatives when the calibration set is of low complexity. We also collect
a large-scale dataset, NISLGaze, which contains large variations in head pose
and face location. We use NISLGaze to evaluate gaze estimation both with and
without calibration in a more realistic setting.
Post a Comment