THESIS
2012
xvii, 103 p. : ill. ; 30 cm
Abstract
In building computer vision systems, the most popular architecture is a flat parallel structure where tasks are considered independently and each task is solved by cascading a feature extraction stage with a machine learning classifier. In this work, we propose an efficient hierarchical multi-task vision system that integrates stereo and texture cues to accomplish automatic multi-view face detection and head pose estimation....[
Read more ]
In building computer vision systems, the most popular architecture is a flat parallel structure where tasks are considered independently and each task is solved by cascading a feature extraction stage with a machine learning classifier. In this work, we propose an efficient hierarchical multi-task vision system that integrates stereo and texture cues to accomplish automatic multi-view face detection and head pose estimation.
This hierarchical structure is inspired by the hierarchical signal processing in the primate visual cortex, where different perceptual tasks share the same early visual representations and more complex features are extracted from simpler features. It appears that the visual cortex of different kinds of animals use normalized Gabor features as early visual representations. We demonstrate that the same bank of normalized four-orientation Gabor features, improves face detection, disparity detection and head pose estimation. The multi-view face detector based on discrete normalized Gabor features has state-of-the-art performance. Integrating disparity detectors based on disparity energy features extracted from the normalized Gabor features improves both the efficiency and the accuracy of the face detector. Disparity information enables us to filter out 90% of image locations as being less likely to contain faces. Performance is improved because the filtering rejects 32% of the false detections made by a similar monocular detector with the same recall rate.
The same normalized Gabor features are also a robust representation for pose estimation. In particular, in the normalized Gabor feature space faces with similar poses are closer than in other feature spaces. Pose estimation with these features using nonlinear regression based on the Weighted K Nearest-Neighbor (WKNN) performs better than previously reported approaches on the same database under more complex illumination conditions. Combining multi-view face detector and pose estimator, we build up an efficient automatic head pose estimator. We further improve the efficiency of pose estimation using local linear regression method. This method combines multi-class classification with linear regression. This method generates similar estimation accuracy as WKNN estimator, but the computation time is just 5% of that of WKNN estimator.
This system is very efficient. Our implementation on a PC equipped with an i5 2.66GHz CPU and a Nvidia GTX 465 graphic card takes only 42.0ms to detect faces on a 640 x 480 stereo image pair, and only an additional 0.13ms to estimate the pose of each detected face.
Post a Comment