TY - JOUR
T1 - Lexicon+TX
T2 - rapid construction of a multilingual lexicon with under-resourced languages
AU - Lim, Lian Tze
AU - Soon, Lay Ki
AU - Lim, Tek Yong
AU - Tang, Enya Kong
AU - Ranaivo-Malançon, Bali
N1 - Publisher Copyright:
© 2013, Springer Science+Business Media Dordrecht.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2014/9/1
Y1 - 2014/9/1
N2 - Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2 % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.
AB - Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2 % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.
KW - Iban
KW - Malay
KW - Multilingual lexicon
KW - Under-resourced languages
UR - http://www.scopus.com/inward/record.url?scp=84957953727&partnerID=8YFLogxK
U2 - 10.1007/s10579-013-9253-0
DO - 10.1007/s10579-013-9253-0
M3 - Article
AN - SCOPUS:84957953727
VL - 48
SP - 479
EP - 492
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
SN - 1574-020X
IS - 3
ER -