The ITI TXM corpora: Tissue expressions and protein-protein interactions

Beatrice Alex, Claire Grover, Barry Haddow, Mijail Kabadjov, Ewan Klein, Michael Matthews, Stuart Roebuck, Richard Tobin, Xinglong Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We report on two large corpora of semantically annotated full-text biomedical research papers created in order to develop information extraction (IE) tools for the TXM project. Both corpora have been annotated with a range of entities (CellLine, Complex, Developmental-Stage, Disease, DrugCompound, ExperimentalMethod, Fragment, Fusion, GOMOP, Gene, Modification, mRNAcDNA, Mutant, Protein, Tissue), normalisations of selected entities to the NCBI Taxonomy, RefSeq, EntrezGene, ChEBI and MeSH and enriched relations (protein-protein interactions, tissue expressions and fragment- or mutant-protein relations). While one corpus targets protein-protein interactions (PPIs), the focus of other is on tissue expressions (TEs). This paper describes the selected markables and the annotation process of the ITI TXM corpora, and provides a detailed breakdown of the inter-annotator agreement (IAA).
Original languageEnglish
Title of host publicationLREC 2008 Workshop
Subtitle of host publicationBuilding and evaluating resources for biomedical text mining
Number of pages8
Publication statusPublished - May 2008

Fingerprint Dive into the research topics of 'The ITI TXM corpora: Tissue expressions and protein-protein interactions'. Together they form a unique fingerprint.

Cite this