From multilingual web-archives to parallel treebanks in five minutes

Markus Killer, Rico Sennrich, Martin Volk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Tree-to-Tree (t2t) Alignment Pipe is a collection of Python scripts, generating automatically aligned parallel treebanks from multilingual web resources or existing parallel corpora. The pipe contains wrappers for a number of freely available NLP software programs. Once these third party programs have been installed and the system and corpus specific details have been updated, the pipe is designed to generate a parallel treebank with a single program call from a unix command line. We discuss alignment quality on a fully automatically processed parallel corpus.
Original languageEnglish
Title of host publicationConference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011
EditorsH Hedeland, T Schmidt, K Wörner
Place of PublicationHamburg, Germany
PublisherUniversität Hamburg
Pages57-62
Number of pages6
Publication statusPublished - 1 Sep 2011
EventConference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011 - Hamburg, Germany
Duration: 28 Sep 201130 Sep 2011

Publication series

NameArbeiten zur Mehrsprachigkeit - Folge B
PublisherUniversität Hamburg

Conference

ConferenceConference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011
CountryGermany
CityHamburg
Period28/09/1130/09/11

Keywords

  • parallel treebank
  • automatic tree-to-tree alignment
  • TreeAligner
  • Text-und-Berg

Fingerprint Dive into the research topics of 'From multilingual web-archives to parallel treebanks in five minutes'. Together they form a unique fingerprint.

Cite this