Abstract
We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences. We conducted an experiment with a parallel German corpus containing 2000 human-created and 2000 machine-translated paragraphs. The result showed that our method achieved the best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is compared with previous methods on various computer-generated text including translation and paper generation (best accuracy = 67.9%, equal error rate = 32.0%). Experiments on Dutch, another rich resource language, and a low resource one (Japanese) attained similar performances. It demonstrated the efficiency of the coherence features at distinguishing computer-translated from human-created paragraphs on diverse languages.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32) |
| Place of Publication | Hung Hom, Kowloon Hong Kong |
| Publisher | Association for Computational Linguistics (ACL) |
| Number of pages | 9 |
| Publication status | E-pub ahead of print - 3 Dec 2018 |
| Event | 32nd Pacific Asia Conference on Language, Information and Computation - Kowloon, Hong Kong Duration: 1 Dec 2018 → 3 Dec 2018 http://www.cbs.polyu.edu.hk/2018paclic/ |
Conference
| Conference | 32nd Pacific Asia Conference on Language, Information and Computation |
|---|---|
| Abbreviated title | PACLIC 32 |
| Country/Territory | Hong Kong |
| City | Kowloon |
| Period | 1/12/18 → 3/12/18 |
| Internet address |