Projects per year
Abstract
In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection. Firstly, a scene-graph-based decoder (SGDecoder) including sparse scene-graph-guided attention (SSGA) is presented. It captures scene graphs and leverages them to discover OV objects. Secondly, we propose scene-graph-based prediction (SGPred), where we build a scene-graph-based offset regression (SGOR) mechanism to enable mutual enhancement between scene graph extraction and object localization. Thirdly, we design a cross-modal learning mechanism in SGPred. It takes scene graphs as bridges to improve the consistency between cross-modal embeddings for OV object classification. Experiments on COCO and LVIS demonstrate the effectiveness of our approach. Moreover, we show the ability of our model for OV scene graph detection, while previous OV scene graph generation methods cannot tackle this task.
Original language | English |
---|---|
Title of host publication | Proceedings of the 31st ACM International Conference on Multimedia |
Editors | Mukesh K. Saini, Ming-Ching Chang |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 4012-4021 |
Number of pages | 10 |
ISBN (Electronic) | 9798400701085 |
DOIs | |
Publication status | Published - 2023 |
Event | ACM International Conference on Multimedia 2023 - Ottawa, Canada Duration: 29 Oct 2023 → 3 Nov 2023 Conference number: 31st https://dl.acm.org/doi/proceedings/10.1145/3581783 (Proceedings) https://www.acmmm2023.org (Website) |
Conference
Conference | ACM International Conference on Multimedia 2023 |
---|---|
Abbreviated title | MM 2023 |
Country/Territory | Canada |
City | Ottawa |
Period | 29/10/23 → 3/11/23 |
Internet address |
|
Keywords
- object detection
- open-vocabulary
- scene graph
- vision-language
Projects
- 1 Finished
-
Towards Robotic Empathy: A human centred approach to future AI machines
Hayat, M.
Australian Research Council (ARC)
26/10/20 → 24/10/24
Project: Research