Multilingual XML-Based Named Entity Recognition for E-Retail Domains

Claire Grover, Scott McDonald, Donnla Nic Gearailt, Vangelis Karkaletsis, Dimitra Farmakiotou, Georgios Samaritakis, Georgios Petasis, Maria Teresa Pazienza, Michele Vindigni, Frantz Vichot, Francis Wolinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.
Original languageEnglish
Title of host publicationProceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands, Spain
Pages1060-1067
Number of pages8
Publication statusPublished - 2002

Fingerprint

Dive into the research topics of 'Multilingual XML-Based Named Entity Recognition for E-Retail Domains'. Together they form a unique fingerprint.

Cite this