TY - JOUR
T1 - Benchmarking Crisis in Social Media Analytics: A Solution for the Data Sharing Problem
AU - Assenmacher, Dennis
AU - Weber, Derek
AU - Preuss, Mike
AU - Calero Valdez, André
AU - Bradshaw, Alison
AU - Ross, Björn
AU - Cresci, Stefano
AU - Trautmann, Heike
AU - Neumann, Frank
AU - Grimme, Christian
N1 - Funding Information:
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dennis Assenmacher, Christian Grimme, Mike Preuss, and Heike Trautmann acknowledge support by the German Federal Ministry of Education and Research (FKZ 16KIS0495K), the Ministry of Culture and Science of the German State of North Rhine-Westphalia (FKZ 005-1709-0001, EFRE-0801431, and FKZ 005-1709-0006), and the European Research Center for Information Systems (ERCIS). Dennis Assenmacher and Christian Grimme are additionally supported by the DAAD PPP Germany–Australia 2020 project ID 57511656. Stefano Cresci acknowledges funding by the EU H2020 Program under the scheme INFRAIA-01-2018-2019: Research and Innovation action grant agreement #871042 SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics.
Publisher Copyright:
© The Author(s) 2021.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Computational social science uses computational and statistical methods in order to evaluate social interaction. The public availability of data sets is thus a necessary precondition for reliable and replicable research. This data allows researchers to benchmark the computational methods they develop, test the generalizability of their findings, and build confidence in their results. When social media data is concerned, data sharing is often restricted for legal or privacy reasons, which makes the comparison of methods and the replicability of research results infeasible. Social media analytics research, consequently, faces an integrity crisis. How is it possible to create trust in computational or statistical analyses, when they cannot be validated by third parties? In this work, we explore this well-known, yet little discussed, problem for social media analytics. We investigate how this problem can be solved by looking at related computational research areas. Moreover, we propose and implement a prototype to address the problem in the form of a new evaluation framework that enables the comparison of algorithms without the need to exchange data directly, while maintaining flexibility for the algorithm design.
AB - Computational social science uses computational and statistical methods in order to evaluate social interaction. The public availability of data sets is thus a necessary precondition for reliable and replicable research. This data allows researchers to benchmark the computational methods they develop, test the generalizability of their findings, and build confidence in their results. When social media data is concerned, data sharing is often restricted for legal or privacy reasons, which makes the comparison of methods and the replicability of research results infeasible. Social media analytics research, consequently, faces an integrity crisis. How is it possible to create trust in computational or statistical analyses, when they cannot be validated by third parties? In this work, we explore this well-known, yet little discussed, problem for social media analytics. We investigate how this problem can be solved by looking at related computational research areas. Moreover, we propose and implement a prototype to address the problem in the form of a new evaluation framework that enables the comparison of algorithms without the need to exchange data directly, while maintaining flexibility for the algorithm design.
KW - Social Media Analytics
KW - Benchmarking
KW - Social Computing
KW - Reproducibility
U2 - 10.1177/08944393211012268
DO - 10.1177/08944393211012268
M3 - Article
SN - 0894-4393
VL - 40
SP - 1496
EP - 1522
JO - Social science computer review
JF - Social science computer review
IS - 6
ER -