Edinburgh Research Explorer

Calculating Error Bars on Inferences from Web Data

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationSAI Intelligent Systems Conference (IntelliSys)
PublisherIEEE
Number of pages11
StateAccepted/In press - 8 Feb 2018
EventIntelligent Systems Conference (IntelliSys) 2018 - London, United Kingdom
Duration: 6 Sep 20187 Sep 2018
https://saiconference.com/IntelliSys

Conference

ConferenceIntelligent Systems Conference (IntelliSys) 2018
Abbreviated titleIntelliSys 2018
CountryUnited Kingdom
CityLondon
Period6/09/187/09/18
Internet address

Abstract

In this work, we explore uncertainty in automated question answering over real-valued data from knowledge bases on the Internet. We argue that the coefficient of variation (cov) is an intuitive and general form in which to express this uncertainty, with the added advantage that it can be calculated exactly and efficiently. The large amounts of data on the Internet presents a good opportunity to answer queries that go beyond simply looking up facts and returning them. However, such data is often vague and noisy. For discrete results, e.g., stating that a particular city is the capital of a particular country, probabilities are a natural way to assign uncertainty to answers. For continuous variables or quantities that are typically treated as continuous (such as populations of countries), probabilities are uninformative, being infinitesimal For instance, the probability that the population of India is exactly equal to last census count is effectively zero. Our aim is to capture uncertainty in these estimates in an intuitive, uniform, and computationally efficient way. We present initial efforts at automating the inference process over real-valued web data while accounting for some of the typical sources of uncertainty: noisy data and errors from inference operations. Having considered several problem domains and query types, we find that approximating all continuous random variables with Gaussian distributions, and communicating uncertainties to users as coefficients of variation. Our experiments show that the estimates of uncertainty derived by our method are well-calibrated and correlate with the actual deviations from the true answer. An immediate benefit of our approach is that our inference framework can attach credible intervalsto real-valued answers that it infers. This conveys to a user the plausible magnitudes of the error in the answer, a meaningful measure of uncertainty compared to ranking scores
provided in other question answering systems. 1We will use symmetric 68.27 percent credible intervals for the remainder of this paper, corresponding to 1 standard deviation from the mean in a standarized Gaussian, but note that this contains sufficient information to estimate arbitrary posterior probabilities under our assumption of normality.

    Research areas

  • Query Answering, Error Bars, Uncertainty, Bayesian Inference, Coefficient of Variation

Event

Intelligent Systems Conference (IntelliSys) 2018

6/09/187/09/18

London, United Kingdom

Event: Conference

ID: 55342580