In this paper we discuss five different corpora annotated forprotein names. We present several within- and cross-dataset proteintagging experiments showing that different annotation schemes severelyaffect the portability of statistical protein taggers. By means of adetailed error analysis we identify crucial annotation issues thatfuture annotation projects should take into careful consideration.
|Title of host publication||Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006)|
|Number of pages||6|
|Publication status||Published - 2006|