Abstract
This paper describes an advanced platform for web
information extraction (IE) that enables customization to different
domains, languages and users’ interests. This platform was the
result of the R&D project CROSSMARC which involved both
academic and industrial organisations. The platform is composed of
a core system for Web IE and a customization infrastructure. The
system implements a distributed, multi-agent, open and multilingual
architecture that integrates components for (a) collecting domain specific
web pages using crawling and spidering technologies, (b) extracting information from the collected web pages using natural language processing and machine learning techniques, and (c) presenting the extracted information according to users’ interests employing user modelling techniques. The platform’s customisation infrastructure provides an ontology management system and various customisation methods and tools for the creation of the application
specific resources. The platform enables cross-lingual IE, supporting four languages in its current implementation, and has been tested in three different applications.
information extraction (IE) that enables customization to different
domains, languages and users’ interests. This platform was the
result of the R&D project CROSSMARC which involved both
academic and industrial organisations. The platform is composed of
a core system for Web IE and a customization infrastructure. The
system implements a distributed, multi-agent, open and multilingual
architecture that integrates components for (a) collecting domain specific
web pages using crawling and spidering technologies, (b) extracting information from the collected web pages using natural language processing and machine learning techniques, and (c) presenting the extracted information according to users’ interests employing user modelling techniques. The platform’s customisation infrastructure provides an ontology management system and various customisation methods and tools for the creation of the application
specific resources. The platform enables cross-lingual IE, supporting four languages in its current implementation, and has been tested in three different applications.
Original language | English |
---|---|
Title of host publication | Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI'2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004, Valencia, Spain, August 22-27, 2004 |
Pages | 725-729 |
Number of pages | 5 |
Publication status | Published - 2004 |