Exploring the Boundaries: Gene and Protein Identification in Biomedical Text

Jenny Finkel, Shipra Dingare, Christopher Manning, Malvina Nissim, Beatrice Alex, Claire Grover

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.

Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.

Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.

Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalBioinformatics
Volume6
DOIs
Publication statusPublished - 2005

Keywords / Materials (for Non-textual outputs)

  • seer,

Fingerprint

Dive into the research topics of 'Exploring the Boundaries: Gene and Protein Identification in Biomedical Text'. Together they form a unique fingerprint.

Cite this