THESIS
2022
1 online resource (xv, 126 pages) : illustrations (some color)
Abstract
Current emotion recognition models suffer from data imperfections problems, leading
to relatively low performance and biased decisions. This thesis addresses two data
imperfections — imperfect annotations and imperfect samples — individually and
jointly to improve the performance and fairness of facial emotion recognition models.
1. Imperfect Annotations. The lack of databases annotated with all three commonly
used emotion descriptors, i.e., facial action units, categorical emotions,
and valence-arousal, has hindered multi-task emotion model development. We
proposed two approaches to solve this problem: a data-driven approach and
a knowledge-aware approach. Both approaches outperformed previous single-task
and multi-task models, emphasizing the importance of learning the relationship
be...[
Read more ]
Current emotion recognition models suffer from data imperfections problems, leading
to relatively low performance and biased decisions. This thesis addresses two data
imperfections — imperfect annotations and imperfect samples — individually and
jointly to improve the performance and fairness of facial emotion recognition models.
1. Imperfect Annotations. The lack of databases annotated with all three commonly
used emotion descriptors, i.e., facial action units, categorical emotions,
and valence-arousal, has hindered multi-task emotion model development. We
proposed two approaches to solve this problem: a data-driven approach and
a knowledge-aware approach. Both approaches outperformed previous single-task
and multi-task models, emphasizing the importance of learning the relationship
between tasks.
2. Imperfect Samples. We addressed two challenges in predicting valence and arousal
for videos under unconstrained light variations. First, varying illumination conditions
require a robust motion representation. Second, the dynamics of facial
expressions are difficult to capture. For the first challenge, we proposed to use
phase differences instead of optical flow as the motion features. For the second
challenge, we designed a two-stream network to learn the motion features from
two durations corresponding to micro- and macro-expressions. Experimental results
showed that phase differences are more robust than optical flow to illumination changes.
3. Imperfect Samples and Annotations. Many emotion datasets have two biases:
composition bias related to data distribution and annotation bias related to annotators’
prejudice. We proposed a two-stage training method to mitigate composition
bias in the first stage via disentanglement and mitigate annotation bias
in the second stage via a similarity constraint. Our method showed superior performance
to methods targeting only one type of bias.
Our proposed methods may also be applicable in other applications, where inter-task
relationships, the robustness of motion features, and fair representations are of
concern. Future directions suggested by our work include emotion uncertainty prediction
and causal inference of emotions.
Post a Comment