ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs

Thanuja Dharmasiri, Andrew Spek, Tom Drummond

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our framework outperforms previous deep-learning based motion prediction approaches, and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2018
Subtitle of host publication14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I
EditorsC.V. Jawahar, Hongdong Li, Greg Mori, Konrad Schindler
Place of PublicationCham Switzerland
PublisherSpringer
Pages625-642
Number of pages18
ISBN (Electronic)9783030208875
ISBN (Print)9783030208868
DOIs
Publication statusPublished - 2019
EventAsian Conference on Computer Vision 2018 - Perth, Australia
Duration: 2 Dec 20186 Dec 2018
Conference number: 14th
https://link-springer-com.ezproxy.lib.monash.edu.au/book/10.1007/978-3-030-20873-8

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11361
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAsian Conference on Computer Vision 2018
Abbreviated titleACCV 2018
CountryAustralia
CityPerth
Period2/12/186/12/18
Internet address

Keywords

  • Depth
  • Indoor and outdoor datasets
  • Optical flow
  • Pose prediction

Cite this

Dharmasiri, T., Spek, A., & Drummond, T. (2019). ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs. In C. V. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I (pp. 625-642). (Lecture Notes in Computer Science; Vol. 11361). Cham Switzerland: Springer. https://doi.org/10.1007/978-3-030-20887-5_39
Dharmasiri, Thanuja ; Spek, Andrew ; Drummond, Tom. / ENG : end-to-end neural geometry for robust depth and pose estimation using CNNs. Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I. editor / C.V. Jawahar ; Hongdong Li ; Greg Mori ; Konrad Schindler. Cham Switzerland : Springer, 2019. pp. 625-642 (Lecture Notes in Computer Science).
@inproceedings{4d3c2c629b2447b69572855bab256ea6,
title = "ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs",
abstract = "Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our framework outperforms previous deep-learning based motion prediction approaches, and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.",
keywords = "Depth, Indoor and outdoor datasets, Optical flow, Pose prediction",
author = "Thanuja Dharmasiri and Andrew Spek and Tom Drummond",
year = "2019",
doi = "10.1007/978-3-030-20887-5_39",
language = "English",
isbn = "9783030208868",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "625--642",
editor = "C.V. Jawahar and Hongdong Li and Greg Mori and Konrad Schindler",
booktitle = "Computer Vision – ACCV 2018",

}

Dharmasiri, T, Spek, A & Drummond, T 2019, ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs. in CV Jawahar, H Li, G Mori & K Schindler (eds), Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I. Lecture Notes in Computer Science, vol. 11361, Springer, Cham Switzerland, pp. 625-642, Asian Conference on Computer Vision 2018, Perth, Australia, 2/12/18. https://doi.org/10.1007/978-3-030-20887-5_39

ENG : end-to-end neural geometry for robust depth and pose estimation using CNNs. / Dharmasiri, Thanuja; Spek, Andrew; Drummond, Tom.

Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I. ed. / C.V. Jawahar; Hongdong Li; Greg Mori; Konrad Schindler. Cham Switzerland : Springer, 2019. p. 625-642 (Lecture Notes in Computer Science; Vol. 11361).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - ENG

T2 - end-to-end neural geometry for robust depth and pose estimation using CNNs

AU - Dharmasiri, Thanuja

AU - Spek, Andrew

AU - Drummond, Tom

PY - 2019

Y1 - 2019

N2 - Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our framework outperforms previous deep-learning based motion prediction approaches, and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.

AB - Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our framework outperforms previous deep-learning based motion prediction approaches, and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.

KW - Depth

KW - Indoor and outdoor datasets

KW - Optical flow

KW - Pose prediction

UR - http://www.scopus.com/inward/record.url?scp=85066778744&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-20887-5_39

DO - 10.1007/978-3-030-20887-5_39

M3 - Conference Paper

AN - SCOPUS:85066778744

SN - 9783030208868

T3 - Lecture Notes in Computer Science

SP - 625

EP - 642

BT - Computer Vision – ACCV 2018

A2 - Jawahar, C.V.

A2 - Li, Hongdong

A2 - Mori, Greg

A2 - Schindler, Konrad

PB - Springer

CY - Cham Switzerland

ER -

Dharmasiri T, Spek A, Drummond T. ENG: end-to-end neural geometry for robust depth and pose estimation using CNNs. In Jawahar CV, Li H, Mori G, Schindler K, editors, Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision Perth, Australia, December 2–6, 2018 Revised Selected Papers, Part I. Cham Switzerland: Springer. 2019. p. 625-642. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-030-20887-5_39