For many human-robot interaction applications, accurate localization of the human, and in particular the endpoints such as the head, hands and feet, is crucial. In this paper, we propose a new Local Shape Context Descriptor specifically for describing the shape features of the endpoint body parts. The descriptor is computed from edge images obtained from depth data generated by a time-of-flight sensor. The proposed descriptor encodes the distance from a reference point to the nearest edges in uniformly sampled radial directions. Based on this descriptor, a new type of interest point is defined, and a hierarchical algorithm for searching good interest points is developed. The interest points are then classified as head, feet, hands and others based on learned models. The system is computationally efficient, and capable of handling large variations in translation, rotation, scaling and deformation of the body parts. The system is tested using videos containing a variety of motions from a publicly available dataset, and is shown to be capable of detecting and identifying endpoint body parts accurately at very high speed.