Generative Region-Language Pretraining for Open-Ended Object Detection

Chuang Lin, Yi Jiang, Lizhen Qu, Zehuan Yuan, Jianfei Cai

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

In recent research, significant attention has been devoted to the open-vocabulary object detection task, aiming to generalize beyond the limited number of classes labeled during training and detect objects described by arbitrary category names at inference. Compared with conventional object detection, open vocabulary object detection largely extends the object detection categories. However, it relies on calculating the similarity between image regions and a set of arbitrary category names with a pretrained vision-and-language model. This implies that, despite its open-set nature, the task still needs the predefined object categories during the inference stage. This raises the question: What if we do not have exact knowledge of object categories during inference? In this paper, we call such a new setting as generative open-ended object detection, which is a more general and practical problem. To address it, we formulate object detection as a generative problem and propose a simple framework named GenerateU, which can detect dense objects and generate their names in a free-form way. Particularly, we employ Deformable DETR as a region proposal generator with a language model translating visual regions to object names. To assess the free-form object detection task, we introduce an evaluation method designed to quantitatively measure the performance of generative out-comes. Extensive experiments demonstrate strong zero-shot detection performance of our GenerateU. For example, on the LVIS dataset, our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference. Code is available at: https://github.com/FoundationVision/GenerateU.

Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
EditorsEric Mortensen
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages13958-13968
Number of pages11
ISBN (Electronic)9798350353006
ISBN (Print)9798350353013
DOIs
Publication statusPublished - 2024
EventIEEE Conference on Computer Vision and Pattern Recognition 2024 - Seattle, United States of America
Duration: 17 Jun 202421 Jun 2024
https://openaccess.thecvf.com/CVPR2024 (Proceedings)
https://cvpr.thecvf.com/Conferences/2024 (Website)
https://ieeexplore.ieee.org/xpl/conhome/10654794/proceeding (Proceedings)

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition 2024
Abbreviated titleCVPR 2024
Country/TerritoryUnited States of America
CitySeattle
Period17/06/2421/06/24
Internet address

Cite this