Collaborative multimodal photo annotation over digital paper

Paulo Barthelmess, Edward Kaiser, Xiao Huang, David McGee, Philip Cohen

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

9 Citations (Scopus)


The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.

Original languageEnglish
Title of host publicationICMI'06
Subtitle of host publication8th International Conference on Multimodal Interfaces, Conference Proceedings
PublisherAssociation for Computing Machinery (ACM)
Number of pages8
ISBN (Print)159593541X, 9781595935410
Publication statusPublished - 1 Dec 2006
Externally publishedYes
EventInternational Conference on Multimodal Interfaces 2006 - Banff, Canada
Duration: 2 Nov 20064 Nov 2006
Conference number: 8th (Proceedings)


ConferenceInternational Conference on Multimodal Interfaces 2006
Abbreviated titleICMI 2006
Internet address


  • Automatic label extraction
  • Collaborative interaction
  • Intelligent interfaces
  • Multimodal processing
  • Photo annotation

Cite this