TY - GEN
T1 - Uncertainty and inclusivity in gender bias annotation
T2 - 4th Workshop on Gender Bias in Natural Language Processing at NAACL
AU - Havens, Lucy
AU - Terras, Melissa
AU - Bach, Benjamin
AU - Alex, Beatrice
N1 - Funding Information:
Thank you to our collaborators, Rachel Hosker and her team at the Centre for Research Collections; our annotators, Suzanne Black, Ashlyn Cudney, Anna Kuslits, and Iona Walker; and Richard Tobin, who wrote the pre-annotation scripts for this paper's annotation process. We also extend our gratitude to the organizations who provided grants to support the research reported in this paper: the University of Edinburgh's Edinburgh Futures Institute, Centre for Data, Culture &Society, Institute for Language, Cognition and Computation, and School of Informatics; and the UK's Engineering and Physical Sciences Research Council. Additional thanks go to the organizers of the Fourth Workshop on Gender Bias in Natural Language Processing, for the opportunity to submit this paper, and to the reviewers who gave feedback on this paper.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/7/15
Y1 - 2022/7/15
N2 - Mitigating harms from gender biased language in Natural Language Processing (NLP) systems remains a challenge, and the situated nature of language means bias is inescapable in NLP data. Though efforts to mitigate gender bias in NLP are numerous, they often vaguely define gender and bias, only consider two genders, and do not incorporate uncertainty into models. To address these limitations, in this paper we present a taxonomy of gender biased language and apply it to create annotated datasets. We created the taxonomy and annotated data with the aim of making gender bias in language transparent. If biases are communicated clearly, varieties of biased language can be better identified and measured. Our taxonomy contains eleven types of gender biases inclusive of people whose gender expressions do not fit into the binary conceptions of woman and man, and whose gender differs from that they were assigned at birth, while also allowing annotators to document unknown gender information. The taxonomy and annotated data will, in future work, underpin analysis and more equitable language model development.
AB - Mitigating harms from gender biased language in Natural Language Processing (NLP) systems remains a challenge, and the situated nature of language means bias is inescapable in NLP data. Though efforts to mitigate gender bias in NLP are numerous, they often vaguely define gender and bias, only consider two genders, and do not incorporate uncertainty into models. To address these limitations, in this paper we present a taxonomy of gender biased language and apply it to create annotated datasets. We created the taxonomy and annotated data with the aim of making gender bias in language transparent. If biases are communicated clearly, varieties of biased language can be better identified and measured. Our taxonomy contains eleven types of gender biases inclusive of people whose gender expressions do not fit into the binary conceptions of woman and man, and whose gender differs from that they were assigned at birth, while also allowing annotators to document unknown gender information. The taxonomy and annotated data will, in future work, underpin analysis and more equitable language model development.
U2 - 10.18653/v1/2022.gebnlp-1.4
DO - 10.18653/v1/2022.gebnlp-1.4
M3 - Conference contribution
SN - 9781955917681
SP - 30
EP - 57
BT - Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
A2 - Hardmeier, Christian
A2 - Basta, Christine
A2 - Costa-jussà, Marta R.
A2 - Stanovsky, Gabriel
A2 - Gonen, Hila
PB - ACL Anthology
Y2 - 15 July 2022 through 15 July 2022
ER -