Mask propagation for efficient Video Semantic Segmentation

Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

Video Semantic Segmentation (VSS) involves assigning a semantic label to each pixel in a video sequence. Prior work in this field has demonstrated promising results by extending image semantic segmentation models to exploit temporal relationships across video frames; however, these approaches often incur significant computational costs. In this paper, we propose an efficient mask propagation framework for VSS, called MPVSS. Our approach first employs a strong query-based image segmentor on sparse key frames to generate accurate binary masks and class predictions. We then design a flow estimation module utilizing the learned queries to generate a set of segment-aware flow maps, each associated with a mask prediction from the key frame. Finally, the mask-flow pairs are warped to serve as the mask predictions for the non-key frames. By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs. Extensive experiments on VSPW and Cityscapes demonstrate that our mask propagation framework achieves SOTA accuracy and efficiency trade-offs. For instance, our best model with Swin-L backbone outperforms the SOTA MRCFA using MiT-B5 by 4.0% mIoU, requiring only 26% FLOPs on the VSPW dataset. Moreover, our framework reduces up to 4× FLOPs compared to the per-frame Mask2Former baseline with only up to 2% mIoU degradation on the Cityscapes validation set. Code is available at https://github.com/ziplab/MPVSS.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 (NeurIPS 2023)
EditorsA. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
Place of PublicationSan Diego CA USA
PublisherNeural Information Processing Systems (NIPS)
Number of pages14
Publication statusPublished - 2023
EventAdvances in Neural Information Processing Systems 2023 - Ernest N. Morial Convention Center, New Orleans, United States of America
Duration: 10 Dec 202316 Dec 2023
Conference number: 37th
https://openreview.net/group?id=NeurIPS.cc/2023/Conference#tab-accept-oral
https://neurips.cc/ (Website)
https://papers.nips.cc/paper_files/paper/2023 (Proceedings)

Publication series

NameAdvances in Neural Information Processing Systems
PublisherNeural Information Processing Systems (NIPS)
Volume36
ISSN (Print)1049-5258

Conference

ConferenceAdvances in Neural Information Processing Systems 2023
Abbreviated titleNeurIPS 2023
Country/TerritoryUnited States of America
CityNew Orleans
Period10/12/2316/12/23
Internet address

Cite this