TY - JOUR
T1 - Keypoint based weakly supervised human parsing
AU - Wu, Zhonghua
AU - Lin, Guosheng
AU - Cai, Jianfei
PY - 2019/11
Y1 - 2019/11
N2 - Fully convolutional networks (FCN) have achieved great success in human parsing in recent years. In conventional human parsing tasks, pixel-level labeling is required for guiding the training, which usually involves enormous human labeling efforts. To ease the labeling efforts, we propose a novel weakly supervised human parsing method which only requires simple object keypoint annotations for learning. We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels. With these pseudo masks, we train a FCN network to output pixel-level human parsing predictions. Furthermore, we develop a correlation network to perform joint prediction of part and object segmentation masks and improve the segmentation performance. The experiment results show that our weakly supervised method is able to achieve very competitive human parsing results. Despite that our method only uses simple keypoint annotations for learning, we are able to achieve comparable performance with fully supervised methods which use the expensive pixel-level annotations.
AB - Fully convolutional networks (FCN) have achieved great success in human parsing in recent years. In conventional human parsing tasks, pixel-level labeling is required for guiding the training, which usually involves enormous human labeling efforts. To ease the labeling efforts, we propose a novel weakly supervised human parsing method which only requires simple object keypoint annotations for learning. We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels. With these pseudo masks, we train a FCN network to output pixel-level human parsing predictions. Furthermore, we develop a correlation network to perform joint prediction of part and object segmentation masks and improve the segmentation performance. The experiment results show that our weakly supervised method is able to achieve very competitive human parsing results. Despite that our method only uses simple keypoint annotations for learning, we are able to achieve comparable performance with fully supervised methods which use the expensive pixel-level annotations.
KW - Correlation network
KW - Human parsing
KW - Iterative refinement
KW - Keypoint
KW - Skeleton
KW - Weakly supervise
UR - http://www.scopus.com/inward/record.url?scp=85073947029&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2019.08.005
DO - 10.1016/j.imavis.2019.08.005
M3 - Article
AN - SCOPUS:85073947029
SN - 0262-8856
VL - 91
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 0103801
ER -