GraWiTas: a Grammar-based Wikipedia Talk Page Parser

Benjamin Cabrera, Laura Steinert, Bjorn Ross

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article's talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax -- resulting in the loss of content -- and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar-based approach -- offering a transparent implementation and easy extensibility.
Original languageEnglish
Title of host publicationProceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
Place of PublicationValencia, Spain
PublisherAssociation for Computational Linguistics
Pages21-24
Number of pages4
Publication statusPublished - 1 Apr 2017
Event15th Conference on European Chapter of the Association for Computational Linguistics - Valencia, Spain
Duration: 3 Apr 20177 Apr 2017
http://eacl2017.org/

Conference

Conference15th Conference on European Chapter of the Association for Computational Linguistics
Abbreviated titleEACL 2017
Country/TerritorySpain
CityValencia
Period3/04/177/04/17
Internet address

Fingerprint

Dive into the research topics of 'GraWiTas: a Grammar-based Wikipedia Talk Page Parser'. Together they form a unique fingerprint.

Cite this