THESIS
2017
xiii, 78 pages : color illustrations ; 30 cm
Abstract
In this thesis, we apply deep networks to facial expression analysis. We address the problems of expression recognition and generation. In particular, we address the data insufficiency problem of facial expression analysis for training deep networks by transfer learning and novel structure design.
For the recognition problem, we transfer and adapt deep networks pre-trained on ImageNet to the tasks of facial expression classification and action unit (AU) intensity estimation. First, we propose two atomic feature-map selection schemes for features at the higher convolutional layers for facial expression classification: Facial-Occupancy Selection and AU-Selectivity Selection. We then describe a Region of Interest (ROI)-based selection scheme for smile intensity estimation. Our results...[
Read more ]
In this thesis, we apply deep networks to facial expression analysis. We address the problems of expression recognition and generation. In particular, we address the data insufficiency problem of facial expression analysis for training deep networks by transfer learning and novel structure design.
For the recognition problem, we transfer and adapt deep networks pre-trained on ImageNet to the tasks of facial expression classification and action unit (AU) intensity estimation. First, we propose two atomic feature-map selection schemes for features at the higher convolutional layers for facial expression classification: Facial-Occupancy Selection and AU-Selectivity Selection. We then describe a Region of Interest (ROI)-based selection scheme for smile intensity estimation. Our results suggest that a substantial number of feature maps inside deep networks are selective to AUs, and that feature selection makes the system more robust and improves generalization. Second, we also study the dynamics of AU recognition by proposing a spatio-temporal model using a Long-Short Term Memory (LSTM) network for smiling (AU12) estimation. Incorporating temporal information greatly improves the performance. Third, we address a multi-pose AU intensity estimation problem with a multi-task network. We take the bottom layers of VGG16 pre-trained on ImageNet, and fine-tune the overall multi-task structure to learn the shared representation for pose estimation and pose-dependent AU intensity estimation. Our results won the AU intensity estimation sub challenge of FERA2017.
For the generation problem, we propose the Conditional Difference Adversarial Autoencoder (CDAAE) for photo-realistic facial expression synthesis. The CDAAE takes a facial image of a previously unseen person and generates an image of that person’s face with a target facial expression. Despite a paucity of training data, the CDAAE can disambiguate changes due to identity and changes due to facial expression. It achieves this by adding a feedforward path to the autoencoder structure, which is connecting low-level features at the encoder to features at the corresponding level at the decoder. Our results demonstrate that the CDAAE can better preserve identity information when generating facial expressions for unseen subjects than previous approaches. We also show that CDAAE can be used for facial expression interpolation and novel expression generation.
Post a Comment