Abstract
In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.
Original language | English |
---|---|
Title of host publication | CSAI 2023, 2023 International Conference on Computer Science and Artificial Intelligence |
Editors | Eric Jiang, Yanan Sun, Yan Liu, Ran Cheng, Shudong Huang |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 190-194 |
Number of pages | 5 |
ISBN (Electronic) | 9798400708688 |
DOIs | |
Publication status | Published - 2023 |
Event | International Conference on Computer Science and Artificial Intelligence 2023 - Beijing, China Duration: 8 Dec 2023 → 10 Dec 2023 Conference number: 7th https://dl.acm.org/doi/proceedings/10.1145/3638584 (Proceedings) https://www.csai.org/ (Website) |
Conference
Conference | International Conference on Computer Science and Artificial Intelligence 2023 |
---|---|
Abbreviated title | CSAI 2023 |
Country/Territory | China |
City | Beijing |
Period | 8/12/23 → 10/12/23 |
Internet address |
|
Keywords
- computer vision
- neurosymbolic AI
- object representations
- tensor product representations
- vision transformer