Bootstrapping Yahoo! Finance by Wikipedia for Competitor Mining

Tong Ruan, Lijuan Xue, Haofen Wang, Jeff Z. Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Competitive intelligence, one of the key factors of enterprise risk management and decision support, depends on knowledge bases that contain a large amount of competitive information. A variety of finance websites have collected competitive information manually, which can be used as knowledge bases. Yahoo! Finance is one of the largest and most successful finance websites among them. However, they have problems of incompleteness, lack of competitive domain, and not-in-time updating. Wikipedia, which was built with collective wisdom and contains plenty of useful information in various forms, can solve the above-mentioned problems effectively, thus helping build a more comprehensive knowledge base. In this paper, we propose a novel semi-supervised approach to identify competitor information and competitive domain from Wikipedia based on a multi-strategy learning algorithm. More precisely, we leverage seeds of competition between companies and competition between products to distantly supervise the learning process to find text patterns in free texts. Considering that competitive information can be inferred from events, we design a learning-based method to determine event description sentences. The whole process is iteratively performed. The experimental results show the effectiveness of our approach. Moreover, the results extracted from Wikipedia supplement 14,000 competitor pairs and 8,000 competitive domains between rival companies to Yahoo! Finance.
Original languageEnglish
Title of host publicationSemantic Technology
Subtitle of host publication5th Joint International Conference, JIST 2015, Yichang, China, November 11-13, 2015, Revised Selected Papers
EditorsGuilin Qi, Kouji Kozaki, Jeff Z. Pan, Siwei Yu
Place of PublicationCham
PublisherSpringer International Publishing
Number of pages19
ISBN (Electronic)978-3-319-31676-5
ISBN (Print)978-3-319-31675-8
Publication statusPublished - 20 Mar 2016
Event5th Joint International Semantic Technology Conference - Yichang, China
Duration: 11 Nov 201513 Nov 2015

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference5th Joint International Semantic Technology Conference
Abbreviated titleJIST 2015
Internet address


  • Competitor mining
  • Multi-strategy learning
  • Distant supervision
  • Relation reasoning


Dive into the research topics of 'Bootstrapping Yahoo! Finance by Wikipedia for Competitor Mining'. Together they form a unique fingerprint.

Cite this