Transformer scale gate for semantic segmentation

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)

Abstract

Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation. Most of the existing transformer-based segmentation models combine features across scales without any selection, where features on sub-optimal scales may degrade segmentation outcomes. Leveraging from the inherent properties of Vision Transformers, we propose a simple yet effective module, Transformer Scale Gate (TSG), to optimally combine multi-scale features. TSG exploits cues in self and cross attentions in Vision Transformers for the scale selection. TSG is a highly flexible plug-and-play module, and can easily be incorporated with any encoder-decoder-based hierarchical vision Transformer. Extensive experiments on the Pascal Context, ADE20K and Cityscapes datasets demonstrate that the proposed feature selection strategy achieves consistent gains.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
EditorsEric Mortensen
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages3051-3060
Number of pages10
ISBN (Electronic)9798350301298
ISBN (Print)9798350301304
DOIs
Publication statusPublished - 2023
EventIEEE Conference on Computer Vision and Pattern Recognition 2023 - Vancouver, Canada
Duration: 18 Jun 202322 Jun 2023
https://cvpr2023.thecvf.com/ (Website)
https://openaccess.thecvf.com/CVPR2023?day=all (Proceedings)
https://ieeexplore.ieee.org/xpl/conhome/10203037/proceeding (Proceedings)
https://cvpr2023.thecvf.com/Conferences/2023 (Website)

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition 2023
Abbreviated titleCVPR 2023
Country/TerritoryCanada
CityVancouver
Period18/06/2322/06/23
Internet address

Keywords

  • grouping and shape analysis
  • Segmentation

Cite this