Most of the existing research on 3D facial expression recognition has been done using static 3D meshes. 3D videos of a face are believed to contain more information in terms of the facial dynamics which are very critical for expression recognition. This paper presents a fully automatic framework which exploits the dynamics of textured 3D videos for recognition of six discrete facial expressions. Local video-patches of variable lengths are extracted from numerous locations of the training videos and represented as points on the Grassmannian manifold. An efficient graph-based spectral clustering algorithm is used to separately cluster these points for every expression class. Using a valid Grassmannian kernel function, the resulting cluster centers are embedded into a Reproducing Kernel Hilbert Space (RKHS) where six binary SVM models are learnt. Given a query video, we extract video-patches from it, represent them as points on the manifold and match these points with the learnt SVM models followed by a voting based strategy to decide about the class of the query video. The proposed framework is also implemented in parallel on 2D videos and a score level fusion of 2D & 3D videos is performed for performance improvement of the system. The experimental results on BU4DFE data set show that the system achieves a very high classification accuracy for facial expression recognition from 3D videos.
- 3D videos
- Facial expression recognition
- Grassmannian manifold
- spectral clustering
- SVM on Grassmannian manifold