Real-time monocular object instance 6D pose estimation

Thanh-Toan Do, Trung Pham, Ming Cai, Ian Reid

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

20 Citations (Scopus)


In this work, we present, LieNet, a novel deep learning framework that simultaneously detects, segments multiple object instances, and estimates their 6D poses from a single RGB image without requiring additional post-processing. Our system is accurate and fast (∼10 fps), which is well suited for real-time applications. In particular, LieNet detects and segments object instances in the image analogous to modern instance segmentation networks such as Mask R-CNN, but contains a novel additional sub-network for 6D pose estimation. LieNet estimates the rotation matrix of an object by regressing a Lie algebra based rotation representation, and estimates the translation vector by predicting the distance of the object to the camera center. The experiments on two standard pose benchmarking datasets show that LieNet greatly outperforms other recent CNN based pose prediction methods when they are used with monocular images and without post-refinements.

Original languageEnglish
Title of host publication29th British Machine Vision Conference, BMVC 2018
EditorsHubert P. H. Shum, Timothy Hospedales
Place of PublicationLondon UK
PublisherBritish Machine Vision Association
Number of pages12
Publication statusPublished - 2018
Externally publishedYes
EventBritish Machine Vision Conference 2018 - Newcastle, United Kingdom
Duration: 3 Sept 20186 Sept 2018
Conference number: 29th


ConferenceBritish Machine Vision Conference 2018
Abbreviated titleBMVC 2018
Country/TerritoryUnited Kingdom
Internet address

Cite this