Domain-Based Sense Disambiguation in Multilingual Structured Data

Gabor Bella, Alessio Zamboni, Fausto Giunchiglia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Natural language text is pervasive in structured data sets—relational database tables, spreadsheets, XML documents, RDF graphs, etc.—requiring data processing operations to possess some level of natural language understanding capability. This, in turn, involves dealing with aspects of diversity present in structured data such as multilingualism or the coexistence of data from multiple
domains. Word sense disambiguation is an essential component of natural language understanding processes. State-of-the-art WSD techniques, however, were developed to operate on single languages and on corpora that are considerably different from structured data sets, such as articles, newswire, web pages, forum posts, or tweets. In this paper we present a WSD method that is designed for short text typically present in structured data, applicable to multiple languages and domains. Our proof-of-concept implementation reaches
an all-words F-score between 60% and 80% on both English and Italian data. We consider these as very promising first results given the known difficulty of WSD and the particularity of the corpora targeted with respect to more conventional text.
Original languageEnglish
Title of host publicationProceedings of International Workshop on Diversity-Aware Artificial Intelligence (Diversity @ ECAI 2016)
Pages53-61
Number of pages7
Publication statusPublished - 25 Aug 2016
Event1st International Workshop on Diversity-Aware Artificial Intelligence - The Hague, Netherlands
Duration: 29 Aug 201629 Aug 2016
http://www.ecai2016.org/program/workshops/
https://www.essence-network.com/essence-events/international-workshop-on-diversity-aware-artificial-intelligence-diversity-2016-at-ecai-2016/
http://www.ecai2016.org/index.html

Conference

Conference1st International Workshop on Diversity-Aware Artificial Intelligence
Abbreviated titleDIVERSITY 2016
Country/TerritoryNetherlands
CityThe Hague
Period29/08/1629/08/16
Internet address

Fingerprint

Dive into the research topics of 'Domain-Based Sense Disambiguation in Multilingual Structured Data'. Together they form a unique fingerprint.

Cite this