TY - JOUR
T1 - SelfVIO
T2 - self-supervised deep monocular Visual–Inertial Odometry and depth estimation
AU - Almalioglu, Yasin
AU - Turan, Mehmet
AU - Saputra, Muhamad Risqi U.
AU - de Gusmão, Pedro P.B.
AU - Markham, Andrew
AU - Trigoni, Niki
N1 - Funding Information:
This work is supported in part by NIST grant 70NANB17H185 and UKRI EP/S030832/1 ACE-OPS. M.T. thanks TUBITAK for the 2232 International Outstanding Researcher Fellowship and ULAKBIM for High Performance and Grid Computing Center (TRUBA resources). Y.A. would like to thank the Ministry of National Education in Turkey for their funding and support. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Funding Information:
This work is supported in part by NIST grant 70NANB17H185 and UKRI EP/S030832/1 ACE-OPS . M.T. thanks TUBITAK for the 2232 International Outstanding Researcher Fellowship and ULAKBIM for High Performance and Grid Computing Center (TRUBA resources). Y.A. would like to thank the Ministry of National Education in Turkey for their funding and support.
Publisher Copyright:
© 2022 The Author(s)
PY - 2022/6
Y1 - 2022/6
N2 - In the last decade, numerous supervised deep learning approaches have been proposed for visual–inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual–inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.
AB - In the last decade, numerous supervised deep learning approaches have been proposed for visual–inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual–inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.
KW - Deep sensor fusion
KW - Generative adversarial networks
KW - Geometry reconstruction
KW - Machine perception
KW - Self-supervised learning
KW - visual–inertial odometry
UR - http://www.scopus.com/inward/record.url?scp=85126599543&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2022.03.005
DO - 10.1016/j.neunet.2022.03.005
M3 - Article
C2 - 35313245
AN - SCOPUS:85126599543
VL - 150
SP - 119
EP - 136
JO - Neural Networks
JF - Neural Networks
SN - 0893-6080
ER -