A Platform for Cross-Lingual, Domain and User Adaptive Web Information Extraction

Vangelis Karkaletsis, Constantine D. Spyropoulos, Claire Grover, Maria Teresa Pazienza, José Coch, Dimitris Souflis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes an advanced platform for web
information extraction (IE) that enables customization to different
domains, languages and users’ interests. This platform was the
result of the R&D project CROSSMARC which involved both
academic and industrial organisations. The platform is composed of
a core system for Web IE and a customization infrastructure. The
system implements a distributed, multi-agent, open and multilingual
architecture that integrates components for (a) collecting domain specific
web pages using crawling and spidering technologies, (b) extracting information from the collected web pages using natural language processing and machine learning techniques, and (c) presenting the extracted information according to users’ interests employing user modelling techniques. The platform’s customisation infrastructure provides an ontology management system and various customisation methods and tools for the creation of the application
specific resources. The platform enables cross-lingual IE, supporting four languages in its current implementation, and has been tested in three different applications.
Original languageEnglish
Title of host publicationProceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI'2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004, Valencia, Spain, August 22-27, 2004
Pages725-729
Number of pages5
Publication statusPublished - 2004

Fingerprint

Dive into the research topics of 'A Platform for Cross-Lingual, Domain and User Adaptive Web Information Extraction'. Together they form a unique fingerprint.

Cite this