Abstract / Description of output
Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article's talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax -- resulting in the loss of content -- and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar-based approach -- offering a transparent implementation and easy extensibility.
Original language | English |
---|---|
Title of host publication | Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics |
Place of Publication | Valencia, Spain |
Publisher | Association for Computational Linguistics |
Pages | 21-24 |
Number of pages | 4 |
Publication status | Published - 1 Apr 2017 |
Event | 15th Conference on European Chapter of the Association for Computational Linguistics - Valencia, Spain Duration: 3 Apr 2017 → 7 Apr 2017 http://eacl2017.org/ |
Conference
Conference | 15th Conference on European Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | EACL 2017 |
Country/Territory | Spain |
City | Valencia |
Period | 3/04/17 → 7/04/17 |
Internet address |