TY - JOUR
T1 - Hierarchical context features embedding for object detection
AU - Qiu, Heqian
AU - Li, Hongliang
AU - Wu, Qingbo
AU - Meng, Fanman
AU - Xu, Linfeng
AU - Ngan, King Ngi
AU - Shi, Hengcan
N1 - Funding Information:
Manuscript received July 29, 2019; revised November 29, 2019 and January 11, 2020; accepted January 26, 2020. Date of publication February 3, 2020; date of current version November 18, 2020. This work was supported in part by the National Natural Science Foundation of China under Grants 61831005 and 61525102. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Zhu Liu. (Corresponding author: Hongliang Li.) The authors are with the School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; knngan@ uestc.edu.cn Life; [email protected]).
Publisher Copyright:
© 1999-2012 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/12
Y1 - 2020/12
N2 - Pixel-level segmentation has been widely used to improve object detection. Most of the existing methods refine detection features by adding the constraint of the segmentation branch or by simply embedding high-level segmentation features into detection features within the local receptive field. However, noisy segmentation features are unavoidable in real-word applications and can easily cause false positives. To address this problem, we propose a novel hierarchical context embedding module to effectively embed segmentation features into detection features. The idea of this module is to capture hierarchical context information that includes local objects or parts and nonlocal context features by learning multiple attention maps, and subsequently utilize interdependencies between features to recalibrate noisy segmentation features. Furthermore, we use this module in the proposed gated encoder-decoder network that adaptively aggregates feature maps of different resolutions based on the gate mechanism so that we can embed multiscale segmentation feature maps into detection features for more accurate detection of objects of all sizes. Experimental results demonstrate the effectiveness of the proposed method on the Pascal VOC 2012Seg dataset, the Pascal VOC dataset and the MS COCO dataset.
AB - Pixel-level segmentation has been widely used to improve object detection. Most of the existing methods refine detection features by adding the constraint of the segmentation branch or by simply embedding high-level segmentation features into detection features within the local receptive field. However, noisy segmentation features are unavoidable in real-word applications and can easily cause false positives. To address this problem, we propose a novel hierarchical context embedding module to effectively embed segmentation features into detection features. The idea of this module is to capture hierarchical context information that includes local objects or parts and nonlocal context features by learning multiple attention maps, and subsequently utilize interdependencies between features to recalibrate noisy segmentation features. Furthermore, we use this module in the proposed gated encoder-decoder network that adaptively aggregates feature maps of different resolutions based on the gate mechanism so that we can embed multiscale segmentation feature maps into detection features for more accurate detection of objects of all sizes. Experimental results demonstrate the effectiveness of the proposed method on the Pascal VOC 2012Seg dataset, the Pascal VOC dataset and the MS COCO dataset.
KW - gated encoder-decoder network
KW - hierarchical context embedding module
KW - object detection
KW - Segmentation features
UR - https://www.scopus.com/pages/publications/85096239152
U2 - 10.1109/TMM.2020.2971175
DO - 10.1109/TMM.2020.2971175
M3 - Article
AN - SCOPUS:85096239152
SN - 1520-9210
VL - 22
SP - 3039
EP - 3050
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 12
ER -