Projects per year
Abstract
In this work, we explore uncertainty in automated question answering over realvalued data from knowledge bases on the Internet. We argue that the coefficient of variation (cov) is an intuitive and general form in which to express this uncertainty, with the added advantage that it can be calculated exactly and efficiently. The large amounts of data on the Internet presents a good opportunity to answer queries that go beyond simply looking up facts and returning them. However, such data is often vague and noisy. For discrete results, e.g., stating that a particular city is the capital of a particular country, probabilities are a natural way to assign uncertainty to answers. For continuous variables or quantities that are typically treated as continuous (such as populations of countries), probabilities are uninformative, being infinitesimal For instance, the probability that the population of India is exactly equal to last census count is effectively zero. Our aim is to capture uncertainty in these estimates in an intuitive, uniform, and computationally efficient way. We present initial efforts at automating the inference process over realvalued web data while accounting for some of the typical sources of uncertainty: noisy data and errors from inference operations. Having considered several problem domains and query types, we find that approximating all continuous random variables with Gaussian distributions, and communicating uncertainties to users as coefficients of variation. Our experiments show that the estimates of uncertainty derived by our method are wellcalibrated and correlate with the actual deviations from the true answer. An immediate benefit of our approach is that our inference framework can attach credible intervals^{1 }to realvalued answers that it infers. This conveys to a user the plausible magnitudes of the error in the answer, a meaningful measure of uncertainty compared to ranking scores provided in other question answering systems. ^{1}We will use symmetric 68.27 percent credible intervals for the remainder of this paper, corresponding to 1 standard deviation from the mean in a standarized Gaussian, but note that this contains sufficient information to estimate arbitrary posterior probabilities under our assumption of normality.
Original language  English 

Title of host publication  SAI Intelligent Systems Conference (IntelliSys) 
Place of Publication  London, United Kingdom 
Publisher  Springer, Cham 
Pages  618640 
Number of pages  23 
ISBN (Electronic)  9783030010577 
ISBN (Print)  9783030010560 
DOIs  
Publication status  Published  8 Nov 2018 
Event  Intelligent Systems Conference (IntelliSys) 2018  London, United Kingdom Duration: 6 Sep 2018 → 7 Sep 2018 https://saiconference.com/IntelliSys 
Publication series
Name  Advances in Intelligent Systems and Computing (AISC) 

Publisher  Springer, Cham 
Volume  869 
ISSN (Print)  21945357 
ISSN (Electronic)  21945365 
Conference
Conference  Intelligent Systems Conference (IntelliSys) 2018 

Abbreviated title  IntelliSys 2018 
Country  United Kingdom 
City  London 
Period  6/09/18 → 7/09/18 
Internet address 
Keywords
 Query Answering
 Error Bars
 Uncertainty
 Bayesian Inference
 Coefficient of Variation
Fingerprint Dive into the research topics of 'Calculating Error Bars on Inferences from Web Data'. Together they form a unique fingerprint.
Projects
 1 Finished

FRANK: Research Collaboration on Query Answering Systems
NonEU industry, commerce and public corporations
1/02/18 → 31/01/21
Project: Research