Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation

Margita Šoštarić, Christian Hardmeier, Sara Stymne

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications.
Original languageEnglish
Title of host publicationProceedings of the Third Conference on Machine Translation: Research Papers
Place of PublicationBelgium, Brussels
PublisherAssociation for Computational Linguistics
Pages36-48
Number of pages13
ISBN (Electronic)978-1-948087-81-0
DOIs
Publication statusPublished - 1 Nov 2018
EventEMNLP 2018 Third Conference on Machine Translation (WMT18) - Brussels, Belgium
Duration: 31 Oct 20181 Nov 2018
http://www.statmt.org/wmt18/

Workshop

WorkshopEMNLP 2018 Third Conference on Machine Translation (WMT18)
Abbreviated titleWMT18
Country/TerritoryBelgium
CityBrussels
Period31/10/181/11/18
Internet address

Fingerprint

Dive into the research topics of 'Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation'. Together they form a unique fingerprint.

Cite this