Learning to Scale Multilingual Representations for vision-language tasks

Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

20 Citations (Scopus)

Abstract

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods.

Original languageEnglish
Title of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part IV
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham Switzerland
PublisherSpringer
Pages197-213
Number of pages17
ISBN (Electronic)9783030585488
ISBN (Print)9783030585471
DOIs
Publication statusPublished - 2020
Externally publishedYes
EventEuropean Conference on Computer Vision 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
Conference number: 16th
https://link.springer.com/book/10.1007/978-3-030-58452-8 (Proceedings)
https://eccv2020.eu (Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12349
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision 2020
Abbreviated titleECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20
Internet address

Keywords

  • Image-sentence retrieval
  • Multilingual word embeddings
  • Scalable vision-language models

Cite this