Deep laparoscopic stereo matching with transformers

Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zhiyong Wang, Zongyuan Ge

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

12 Citations (Scopus)

Abstract

The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet.

Original languageEnglish
Title of host publication25th International Conference Singapore, September 18–22, 2022 Proceedings, Part VII
EditorsLinwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, Shuo Li
Place of PublicationCham Switzerland
PublisherSpringer
Pages464-474
Number of pages11
ISBN (Electronic)9783031164491
ISBN (Print)9783031164484
DOIs
Publication statusPublished - 2022
EventMedical Image Computing and Computer-Assisted Intervention 2022 - Singapore, Singapore
Duration: 18 Sept 202222 Sept 2022
Conference number: 25th
https://link.springer.com/book/10.1007/978-3-031-16434-7 (Proceedings - Part 2)
https://conferences.miccai.org/2022/en/ (Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume13437
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceMedical Image Computing and Computer-Assisted Intervention 2022
Abbreviated titleMICCAI 2022
Country/TerritorySingapore
CitySingapore
Period18/09/2222/09/22
Internet address

Keywords

  • Laparoscopic video
  • Stereo matching
  • Transformer

Cite this