Abstract
Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and techniques to analyze these variants and infer phylogenetic trees for a hypothesized accurate representation based on the output of the computational algorithm used. In our work, we detect cognates among a few Indian languages namely Hindi, Marathi, Punjabi, and Sanskrit for helping build cognate sets for phylogenetic inference. Cognate detection helps phylogenetic inference by helping isolate diachronic sound changes and thus detect the words of a common origin. A cognate set manually annotated with the help of a lexicographer is generally used to automatically infer phylogenetic trees. Our work creates cognate sets of each language pair and infers phylogenetic trees based on a bayesian framework using the Maximum likelihood method. We also implement our work to an online interface and infer phylogenetic trees based on automatically detected cognate sets. The online interface helps create phylogenetic trees based on the textual data provided as an input. It helps a lexicographer provide manual input of data, edit the data based on their expert opinion and eventually create phylogenetic trees based on various algorithms including our work on automatically creating cognate sets. We go on to discuss the nuances in detection cognates with respect to these Indian languages and also discuss the categorization of Cognate words i.e., “Tatasama” and “Tadbhava” words.
Original language | English |
---|---|
Title of host publication | CODS-COMAD 2019 |
Subtitle of host publication | Proceedings of the 6th ACM IKDD CoDS and 24th COMAD, January 3 - 5, 2019, Kolkata, India |
Editors | Raghu Krishnapuram, Parag Singla |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 297-300 |
Number of pages | 4 |
ISBN (Electronic) | 9781450362078 |
DOIs | |
Publication status | Published - 2019 |
Event | ACM India Joint International Conference on Data Science and Management of Data 2019 - Kolkata, India Duration: 3 Jan 2019 → 5 Jan 2019 Conference number: 6th & 24th https://cods-comad.in/2019/index.html |
Conference
Conference | ACM India Joint International Conference on Data Science and Management of Data 2019 |
---|---|
Abbreviated title | CoDS-COMAD 2019 |
Country/Territory | India |
City | Kolkata |
Period | 3/01/19 → 5/01/19 |
Internet address |
Keywords
- Cognate Detection
- Cognate Identification
- Computational Phylogenetics
- Historical Linguistics
- Indian Languages
- Natural Language Processing
- Phylogenetic Tree Generation
- Phylogenetics