TY - GEN
T1 - Exploring the Use of Linguistic Features in Domain and Genre Classification
AU - Wolters, Maria
AU - Kirsten, Mathias
PY - 1999
Y1 - 1999
N2 - The central questions are: How useful is information about part-of-speech frequency for text categorisation? Is it feasible to limit word features to content words for text classifications? This is examined for 5 domain and 4 genre classification tasks using LIMAS, the German equivalent of the Brown corpus. Because LIMAS is too heterogeneous, neither question can be answered reliably for any of the tasks. However, the results suggest that both questions have to be examined separately for each task at hand, because in some cases, the additional information can indeed improve performance.
AB - The central questions are: How useful is information about part-of-speech frequency for text categorisation? Is it feasible to limit word features to content words for text classifications? This is examined for 5 domain and 4 genre classification tasks using LIMAS, the German equivalent of the Brown corpus. Because LIMAS is too heterogeneous, neither question can be answered reliably for any of the tasks. However, the results suggest that both questions have to be examined separately for each task at hand, because in some cases, the additional information can indeed improve performance.
U2 - 10.3115/977035.977055
DO - 10.3115/977035.977055
M3 - Conference contribution
T3 - EACL '99
SP - 142
EP - 149
BT - Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics
CY - Stroudsburg, PA, USA
T2 - Ninth Conference on European Chapter of the Association for Computational Linguistics
Y2 - 8 June 1999 through 12 June 1999
ER -