Abstract
Recent Open-Vocabulary Semantic Segmentation (OVS) works typically follow the mask proposal pipeline that decouples semantic segmentation into class-agnostic mask generation and mask-class matching. They train mask generation modules on segmentation datasets, while learning mask-class matching from pretrained vision-language models and large-scale image classification datasets to recognize open-vocabulary classes. There are two major challenges in this pipeline during training: 1) mismatching between mask proposals and classes, as well as 2) domain and label gaps between classification and segmentation datasets. In this paper, we propose a novel CA-OVS framework to solve these challenges. For the first challenge, a Wasserstein-distance-based clustering method is presented to better match masks and classes. For the second challenge, we propose to transfer the information of the mask proposals from the segmentation dataset to the classification dataset by minimizing their Wasserstein distance. Extensive experiments on several OVS datasets show that our method outperforms many state-of-the-art approaches.
Original language | English |
---|---|
Title of host publication | Proceedings of the 6th ACM International Conference on Multimedia in Asia |
Editors | Jun Zhou, Anup Basu, Min Xu |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Number of pages | 8 |
ISBN (Electronic) | 9798400712739 |
DOIs | |
Publication status | Published - 2024 |
Event | ACM International Conference on Multimedia in Asia 2024 - Auckland, New Zealand Duration: 3 Dec 2024 → 6 Dec 2024 Conference number: 6th https://dl.acm.org/doi/proceedings/10.1145/3696409 (Proceedings) https://mmasia2024.org/ (Website) |
Conference
Conference | ACM International Conference on Multimedia in Asia 2024 |
---|---|
Abbreviated title | MMAsia 2024 |
Country/Territory | New Zealand |
City | Auckland |
Period | 3/12/24 → 6/12/24 |
Internet address |
|
Keywords
- Open-Vocabulary Semantic Segmentation