Towards Discrete Object Representations in Vision Transformers with Tensor Products

Wei Yuen Teh, Chern Hong Lim, Mei Kuan Lim, Ian, Kim Teck Tan

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.

Original languageEnglish
Title of host publicationCSAI 2023, 2023 International Conference on Computer Science and Artificial Intelligence
EditorsEric Jiang, Yanan Sun, Yan Liu, Ran Cheng, Shudong Huang
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages190-194
Number of pages5
ISBN (Electronic)9798400708688
DOIs
Publication statusPublished - 2023
EventInternational Conference on Computer Science and Artificial Intelligence 2023 - Beijing, China
Duration: 8 Dec 202310 Dec 2023
Conference number: 7th
https://dl.acm.org/doi/proceedings/10.1145/3638584 (Proceedings)
https://www.csai.org/ (Website)

Conference

ConferenceInternational Conference on Computer Science and Artificial Intelligence 2023
Abbreviated titleCSAI 2023
Country/TerritoryChina
CityBeijing
Period8/12/2310/12/23
Internet address

Keywords

  • computer vision
  • neurosymbolic AI
  • object representations
  • tensor product representations
  • vision transformer

Cite this