Real-time joint semantic segmentation and depth estimation using asymmetric annotations

Vladimir Nekrasov, Thanuja Dharmasiri, Andrew Spek, Tom Drummond, Chunhua Shen, Ian Reid

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful 'teacher' network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640×480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework [1] for dense 3D semantic reconstruction of the scene.33The models are available here: https://github.com/drsleep/ multi-task-refinenet.

Original languageEnglish
Title of host publication2019 International Conference on Robotics and Automation (ICRA)
EditorsJaydev P. Desai
Place of PublicationDanvers MA USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages7101-7107
Number of pages7
ISBN (Electronic)9781538660263
DOIs
Publication statusPublished - 2019
EventIEEE International Conference on Robotics and Automation 2019 - Montreal, Canada
Duration: 20 May 201924 May 2019

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISSN (Print)1050-4729

Conference

ConferenceIEEE International Conference on Robotics and Automation 2019
Abbreviated titleICRA 2019
CountryCanada
CityMontreal
Period20/05/1924/05/19

Cite this

Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., & Reid, I. (2019). Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In J. P. Desai (Ed.), 2019 International Conference on Robotics and Automation (ICRA) (pp. 7101-7107). (Proceedings - IEEE International Conference on Robotics and Automation). Danvers MA USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICRA.2019.8794220
Nekrasov, Vladimir ; Dharmasiri, Thanuja ; Spek, Andrew ; Drummond, Tom ; Shen, Chunhua ; Reid, Ian. / Real-time joint semantic segmentation and depth estimation using asymmetric annotations. 2019 International Conference on Robotics and Automation (ICRA). editor / Jaydev P. Desai. Danvers MA USA : IEEE, Institute of Electrical and Electronics Engineers, 2019. pp. 7101-7107 (Proceedings - IEEE International Conference on Robotics and Automation).
@inproceedings{64c0124dd74145f69c48ccaafca408b7,
title = "Real-time joint semantic segmentation and depth estimation using asymmetric annotations",
abstract = "Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful 'teacher' network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640×480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework [1] for dense 3D semantic reconstruction of the scene.33The models are available here: https://github.com/drsleep/ multi-task-refinenet.",
author = "Vladimir Nekrasov and Thanuja Dharmasiri and Andrew Spek and Tom Drummond and Chunhua Shen and Ian Reid",
year = "2019",
doi = "10.1109/ICRA.2019.8794220",
language = "English",
series = "Proceedings - IEEE International Conference on Robotics and Automation",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
pages = "7101--7107",
editor = "Desai, {Jaydev P.}",
booktitle = "2019 International Conference on Robotics and Automation (ICRA)",
address = "United States of America",

}

Nekrasov, V, Dharmasiri, T, Spek, A, Drummond, T, Shen, C & Reid, I 2019, Real-time joint semantic segmentation and depth estimation using asymmetric annotations. in JP Desai (ed.), 2019 International Conference on Robotics and Automation (ICRA). Proceedings - IEEE International Conference on Robotics and Automation, IEEE, Institute of Electrical and Electronics Engineers, Danvers MA USA, pp. 7101-7107, IEEE International Conference on Robotics and Automation 2019, Montreal, Canada, 20/05/19. https://doi.org/10.1109/ICRA.2019.8794220

Real-time joint semantic segmentation and depth estimation using asymmetric annotations. / Nekrasov, Vladimir; Dharmasiri, Thanuja; Spek, Andrew; Drummond, Tom; Shen, Chunhua; Reid, Ian.

2019 International Conference on Robotics and Automation (ICRA). ed. / Jaydev P. Desai. Danvers MA USA : IEEE, Institute of Electrical and Electronics Engineers, 2019. p. 7101-7107 (Proceedings - IEEE International Conference on Robotics and Automation).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Real-time joint semantic segmentation and depth estimation using asymmetric annotations

AU - Nekrasov, Vladimir

AU - Dharmasiri, Thanuja

AU - Spek, Andrew

AU - Drummond, Tom

AU - Shen, Chunhua

AU - Reid, Ian

PY - 2019

Y1 - 2019

N2 - Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful 'teacher' network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640×480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework [1] for dense 3D semantic reconstruction of the scene.33The models are available here: https://github.com/drsleep/ multi-task-refinenet.

AB - Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful 'teacher' network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640×480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework [1] for dense 3D semantic reconstruction of the scene.33The models are available here: https://github.com/drsleep/ multi-task-refinenet.

UR - http://www.scopus.com/inward/record.url?scp=85071514699&partnerID=8YFLogxK

U2 - 10.1109/ICRA.2019.8794220

DO - 10.1109/ICRA.2019.8794220

M3 - Conference Paper

AN - SCOPUS:85071514699

T3 - Proceedings - IEEE International Conference on Robotics and Automation

SP - 7101

EP - 7107

BT - 2019 International Conference on Robotics and Automation (ICRA)

A2 - Desai, Jaydev P.

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Danvers MA USA

ER -

Nekrasov V, Dharmasiri T, Spek A, Drummond T, Shen C, Reid I. Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In Desai JP, editor, 2019 International Conference on Robotics and Automation (ICRA). Danvers MA USA: IEEE, Institute of Electrical and Electronics Engineers. 2019. p. 7101-7107. (Proceedings - IEEE International Conference on Robotics and Automation). https://doi.org/10.1109/ICRA.2019.8794220