Abstract
We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree's hierarchical knowledge is incorporated into the transformed embeddings, which can be used for solving the target vision-language tasks. Specifically, we showcase that our APN can strengthen Transformer based networks in two major vision-language tasks: Captioning and Visual Question Answering. Also, a PGM probability-based parsing algorithm is developed by which we can discover what the hidden structure of input is during the inference.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 |
Editors | Dima Damen, Tal Hassner, Chris Pal, Yoichi Sato |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 2177-2187 |
Number of pages | 11 |
ISBN (Electronic) | 9781665428125 |
ISBN (Print) | 9781665428132 |
DOIs | |
Publication status | Published - 2021 |
Event | IEEE International Conference on Computer Vision 2021 - Online, United States of America Duration: 11 Oct 2021 → 17 Oct 2021 https://iccv2021.thecvf.com/home (Website) https://ieeexplore.ieee.org/xpl/conhome/9709627/proceeding (Proceedings) |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
ISSN (Print) | 1550-5499 |
ISSN (Electronic) | 2380-7504 |
Conference
Conference | IEEE International Conference on Computer Vision 2021 |
---|---|
Abbreviated title | ICCV 2021 |
Country/Territory | United States of America |
City | Online |
Period | 11/10/21 → 17/10/21 |
Internet address |
|