Abstract
The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet.
Original language | English |
---|---|
Title of host publication | 25th International Conference Singapore, September 18–22, 2022 Proceedings, Part VII |
Editors | Linwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, Shuo Li |
Place of Publication | Cham Switzerland |
Publisher | Springer |
Pages | 464-474 |
Number of pages | 11 |
ISBN (Electronic) | 9783031164491 |
ISBN (Print) | 9783031164484 |
DOIs | |
Publication status | Published - 2022 |
Event | Medical Image Computing and Computer-Assisted Intervention 2022 - Singapore, Singapore Duration: 18 Sept 2022 → 22 Sept 2022 Conference number: 25th https://link.springer.com/book/10.1007/978-3-031-16434-7 (Proceedings - Part 2) https://conferences.miccai.org/2022/en/ (Website) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 13437 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Medical Image Computing and Computer-Assisted Intervention 2022 |
---|---|
Abbreviated title | MICCAI 2022 |
Country/Territory | Singapore |
City | Singapore |
Period | 18/09/22 → 22/09/22 |
Internet address |
|
Keywords
- Laparoscopic video
- Stereo matching
- Transformer