Mining inter-video proposal relations for video object detection

Mingfei Han, Yali Wang, Xiaojun Chang, Yu Qiao

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

36 Citations (Scopus)


Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part XXI
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham Switzerland
Number of pages16
ISBN (Electronic)9783030585891
ISBN (Print)9783030585884
Publication statusPublished - 2020
EventEuropean Conference on Computer Vision 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
Conference number: 16th (Proceedings) (Website)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision 2020
Abbreviated titleECCV 2020
Country/TerritoryUnited Kingdom
Internet address


  • Hierachical Video Relation Network
  • Inter-Video Proposal Relation
  • Multi-level triplet selection
  • Video object detection

Cite this