ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs

Thanuja Dharmasiri, Andrew Spek, Tom Drummond

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)


Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our framework outperforms previous deep-learning based motion prediction approaches, and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2018
Subtitle of host publication14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I
EditorsC.V. Jawahar, Hongdong Li, Greg Mori, Konrad Schindler
Place of PublicationCham Switzerland
Number of pages18
ISBN (Electronic)9783030208875
ISBN (Print)9783030208868
Publication statusPublished - 2019
EventAsian Conference on Computer Vision 2018 - Perth, Australia
Duration: 2 Dec 20186 Dec 2018
Conference number: 14th
https://link.springer.com/book/10.1007/978-3-030-20887-5 (Proceedings)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceAsian Conference on Computer Vision 2018
Abbreviated titleACCV 2018
Internet address


  • Depth
  • Indoor and outdoor datasets
  • Optical flow
  • Pose prediction

Cite this