Abstract
The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.
Original language | English |
---|---|
Title of host publication | ICMI'06 |
Subtitle of host publication | 8th International Conference on Multimodal Interfaces, Conference Proceedings |
Publisher | Association for Computing Machinery (ACM) |
Pages | 4-11 |
Number of pages | 8 |
ISBN (Print) | 159593541X, 9781595935410 |
DOIs | |
Publication status | Published - 1 Dec 2006 |
Externally published | Yes |
Event | International Conference on Multimodal Interfaces 2006 - Banff, Canada Duration: 2 Nov 2006 → 4 Nov 2006 Conference number: 8th https://dl.acm.org/doi/proceedings/10.1145/1180995 (Proceedings) |
Conference
Conference | International Conference on Multimodal Interfaces 2006 |
---|---|
Abbreviated title | ICMI 2006 |
Country/Territory | Canada |
City | Banff |
Period | 2/11/06 → 4/11/06 |
Internet address |
|
Keywords
- Automatic label extraction
- Collaborative interaction
- Intelligent interfaces
- Multimodal processing
- Photo annotation