CA-OVS: Cluster and Adapt Mask Proposals for Open-Vocabulary Semantic Segmentation

Son Duy Dao, Hengcan Shi, Dinh Q. Phung, Jianfei Cai

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

Abstract

Recent Open-Vocabulary Semantic Segmentation (OVS) works typically follow the mask proposal pipeline that decouples semantic segmentation into class-agnostic mask generation and mask-class matching. They train mask generation modules on segmentation datasets, while learning mask-class matching from pretrained vision-language models and large-scale image classification datasets to recognize open-vocabulary classes. There are two major challenges in this pipeline during training: 1) mismatching between mask proposals and classes, as well as 2) domain and label gaps between classification and segmentation datasets. In this paper, we propose a novel CA-OVS framework to solve these challenges. For the first challenge, a Wasserstein-distance-based clustering method is presented to better match masks and classes. For the second challenge, we propose to transfer the information of the mask proposals from the segmentation dataset to the classification dataset by minimizing their Wasserstein distance. Extensive experiments on several OVS datasets show that our method outperforms many state-of-the-art approaches.

Original languageEnglish
Title of host publicationProceedings of the 6th ACM International Conference on Multimedia in Asia
EditorsJun Zhou, Anup Basu, Min Xu
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages8
ISBN (Electronic)9798400712739
DOIs
Publication statusPublished - 2024
EventACM International Conference on Multimedia in Asia 2024 - Auckland, New Zealand
Duration: 3 Dec 20246 Dec 2024
Conference number: 6th
https://dl.acm.org/doi/proceedings/10.1145/3696409 (Proceedings)
https://mmasia2024.org/ (Website)

Conference

ConferenceACM International Conference on Multimedia in Asia 2024
Abbreviated titleMMAsia 2024
Country/TerritoryNew Zealand
CityAuckland
Period3/12/246/12/24
Internet address

Keywords

  • Open-Vocabulary Semantic Segmentation

Cite this