Exploring the Use of Linguistic Features in Domain and Genre Classification

Maria Wolters, Mathias Kirsten

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The central questions are: How useful is information about part-of-speech frequency for text categorisation? Is it feasible to limit word features to content words for text classifications? This is examined for 5 domain and 4 genre classification tasks using LIMAS, the German equivalent of the Brown corpus. Because LIMAS is too heterogeneous, neither question can be answered reliably for any of the tasks. However, the results suggest that both questions have to be examined separately for each task at hand, because in some cases, the additional information can indeed improve performance.
Original languageEnglish
Title of host publicationProceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages142-149
Number of pages8
DOIs
Publication statusPublished - 1999
EventNinth Conference on European Chapter of the Association for Computational Linguistics - University of Bergen, Bergen, Norway
Duration: 8 Jun 199912 Jun 1999

Publication series

NameEACL '99
PublisherAssociation for Computational Linguistics

Conference

ConferenceNinth Conference on European Chapter of the Association for Computational Linguistics
CountryNorway
CityBergen
Period8/06/9912/06/99

Fingerprint Dive into the research topics of 'Exploring the Use of Linguistic Features in Domain and Genre Classification'. Together they form a unique fingerprint.

Cite this