Local spatiotemporal detectors and descriptors have recently become very popular for video analysis in many applications. They do not require any preprocessing steps and are invariant to spatial and temporal scales. Despite their computational simplicity, they have not been evaluated and tested for video analysis of facial data. This paper considers two space-time detectors and four descriptors and uses bag of features framework for human facial expression recognition on BU 4DFE data set. A comparison of local spatiotemporal features with other non-spatiotemporal published techniques on the same data set is also given. Unlike spatiotemporal features, these techniques involve time consuming and computationally intensive preprocessing steps like manual initialization and tracking of facial points. Our results show that despite being totally automatic and not requiring any user intervention, local spacetime features provide promising and comparable performance for facial expression recognition on BU 4DFE data set.