Word Level Language Identification in Online Multilingual Communication

Dong Nguyen, A. Seza Dogruöz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language
models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
Original languageEnglish
Title of host publicationProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL
PublisherAssociation for Computational Linguistics
Pages857-862
Number of pages6
ISBN (Print)978-1-937284-97-8
Publication statusPublished - Oct 2013

Fingerprint

Dive into the research topics of 'Word Level Language Identification in Online Multilingual Communication'. Together they form a unique fingerprint.

Cite this