Mining inter-video proposal relations for video object detection

Mingfei Han, Yali Wang, Xiaojun Chang, Yu Qiao

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part XXI
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham Switzerland
Number of pages16
ISBN (Electronic)9783030585891
ISBN (Print)9783030585884
Publication statusPublished - 2020
EventEuropean Conference on Computer Vision 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
Conference number: 16th (Proceedings) (Website)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision 2020
Abbreviated titleECCV 2020
CountryUnited Kingdom
Internet address


  • Hierachical Video Relation Network
  • Inter-Video Proposal Relation
  • Multi-level triplet selection
  • Video object detection

Cite this