This paper presents a pioneering work on building a Named Entity Recognition system for the Mongolian language, with an agglutinative morphology and a subject-object-verb word order. Our work explores the fittest feature set from a wide range of features and a method that refines machine learning approach using gazetteers with approximate string matching, in an effort for robust handling of out-of-vocabulary words. As well as we tried to apply various existing machine learning methods and find optimal ensemble of classifiers based on genetic algorithm. The classifiers uses different feature representations. The resulting system constitutes the first-ever usable software package for Mongolian NER, while our experimental evaluation will also serve as a much-needed basis of comparison for further research.
|Title of host publication||Text, Speech, and Dialogue|
|Subtitle of host publication||18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings|
|Number of pages||9|
|Publication status||Published - 2015|
|Name||Lecture Notes in Computer Science |