Identifying Computer-Translated Paragraphs using Coherence Features

Hoang-Quoc Nguyen-Son, Ngoc-Dung T. Tieu, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences. We conducted an experiment with a parallel German corpus containing 2000 human-created and 2000 machine-translated paragraphs. The result showed that our method achieved the best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is compared with previous methods on various computer-generated text including translation and paper generation (best accuracy = 67.9%, equal error rate = 32.0%). Experiments on Dutch, another rich resource language, and a low resource one (Japanese) attained similar performances. It demonstrated the efficiency of the coherence features at distinguishing computer-translated from human-created paragraphs on diverse languages.
Original languageEnglish
Title of host publicationProceedings of the 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32)
Place of PublicationHung Hom, Kowloon Hong Kong
PublisherAssociation for Computational Linguistics (ACL)
Number of pages9
Publication statusE-pub ahead of print - 3 Dec 2018
Event32nd Pacific Asia Conference on Language, Information and Computation - Kowloon, Hong Kong
Duration: 1 Dec 20183 Dec 2018
http://www.cbs.polyu.edu.hk/2018paclic/

Conference

Conference32nd Pacific Asia Conference on Language, Information and Computation
Abbreviated titlePACLIC 32
Country/TerritoryHong Kong
CityKowloon
Period1/12/183/12/18
Internet address

Fingerprint

Dive into the research topics of 'Identifying Computer-Translated Paragraphs using Coherence Features'. Together they form a unique fingerprint.

Cite this