Searching semantic web documents based on RDF sentences

Honghan Wu*, Yuzhong Qu, Huiying Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Keyword-based semantic Web document search is one of the most efficient approaches to find semantic Web data. Most existing approaches are based on traditional IR technologies, in which documents are modeled as bag of words. The authors identify the difficulties of these technologies in processing RDF documents, namely, preserving data structures, processing linked data and generating snippets. An approach is proposed to model the semantic Web document from its Abstract syntax: RDF graph. In this approach, a document is modeled as a set of RDF sentences. It preserves the RDF sentence-based structures in the processes of document analyzing and indexing. The authoritative descriptions of named resources are also introduced and it enables the linked data across document boundaries to be searchable. Furthermore, to help users quickly determine whether one result is relevant or not, The traditional inverse index structure is extended to enable more understandable snippet extraction from matched documents. Experiments on real world data show that this approach can significantly improve the precision and recall of semantic Web document search. The precision at top one result is improved up to 19% and a steady improvement (near 10%) is observed. According to 50 random queries, the recall increases up to 60% averagely. Remarkable improvements in system usability are also obtained.

Original languageEnglish
Pages (from-to)255-263
Number of pages9
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Volume47
Issue number2
Publication statusPublished - 1 Feb 2010

Keywords / Materials (for Non-textual outputs)

  • RDF document search
  • RDF sentence
  • Search engine
  • Semantic Web
  • Snippet generation

Fingerprint

Dive into the research topics of 'Searching semantic web documents based on RDF sentences'. Together they form a unique fingerprint.

Cite this