Diffusion Model for Robust Multi-sensor Fusion in 3D Object Detection and BEV Segmentation

Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce “DifFUSER”, a novel approach that leverages diffusion models for multi-modal fusion in 3D object detection and BEV map segmentation. Benefiting from the inherent denoising property of diffusion, DifFUSER is able to refine or even synthesize sensor features in case of sensor malfunction, thereby improving the quality of the fused output. In terms of architecture, our DifFUSER blocks are chained together in a hierarchical BiFPN fashion, termed cMini-BiFPN, offering an alternative architecture for latent diffusion. We further introduce a Gated Self-conditioned Modulated (GSM) latent diffusion module together with a Progressive Sensor Dropout Training (PSDT) paradigm, designed to add stronger conditioning to the diffusion process and robustness to sensor failures. Our extensive evaluations on the Nuscenes dataset reveal that DifFUSER not only achieves state-of-the-art performance with a 70.04% mIOU in BEV map segmentation tasks but also competes effectively with leading transformer-based fusion techniques in 3D object detection.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXVIII
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Place of PublicationCham Switzerland
PublisherSpringer
Pages232-249
Number of pages18
ISBN (Electronic)9783031731136
ISBN (Print)9783031731129
DOIs
Publication statusPublished - 2025
EventEuropean Conference on Computer Vision 2024 - Milan, Italy
Duration: 29 Sept 20244 Oct 2024
Conference number: 18th
https://eccv2024.ecva.net/Conferences/2024/Dates
http://chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://media.eventhosts.cc/Conferences/ECCV2024/ConferenceProgram.pdf (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15126
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision 2024
Abbreviated titleECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24
Internet address

Keywords

  • 3D Object Detection
  • BEV Map Segmentation
  • Diffusion

Cite this