Synthesizing the unseen for zero-shot object detection

Nasir Hayat, Munawar Hayat, Shafin Rahman, Salman Khan, Syed Waqas Zamir, Fahad Shahbaz Khan

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

11 Citations (Scopus)

Abstract

The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference. However, since the unseen objects are never visualized during training, the detection model is skewed towards seen content, thereby labeling unseen as background or a seen class. In this work, we propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. Consequently, the major challenge becomes, how to accurately synthesize unseen objects merely using their class semantics? Towards this ambitious goal, we propose a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them. Further, using a unified model, we ensure the synthesized features have high diversity that represents the intra-class differences and variable localization precision in the detected bounding boxes. We test our approach on three object detection benchmarks, PASCAL VOC, MSCOCO, and ILSVRC detection, under both conventional and generalized settings, showing impressive gains over the state-of-the-art methods. Our codes are available at https://github.com/nasir6/zero_shot_detection.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2020
Subtitle of host publication15th Asian Conference on Computer Vision Kyoto, Japan, November 30 – December 4, 2020 Revised Selected Papers, Part III
EditorsHiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, Jianbo Shi
Place of PublicationCham Switzerland
PublisherSpringer
Pages155-170
Number of pages16
ISBN (Electronic)9783030695354
ISBN (Print)9783030695347
DOIs
Publication statusPublished - 2021
Externally publishedYes
EventAsian Conference on Computer Vision 2020 - Online, Kyoto, Japan
Duration: 30 Nov 20204 Dec 2020
Conference number: 15th
https://link.springer.com/book/10.1007/978-3-030-69535-4 (Proceedings)
https://accv2020.github.io (Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12624
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAsian Conference on Computer Vision 2020
Abbreviated titleACCV 2020
Country/TerritoryJapan
CityKyoto
Period30/11/204/12/20
Internet address

Keywords

  • Generative adversarial learning
  • Visual-semantic relationships
  • Zero-shot object detection

Cite this