TY - JOUR
T1 - In conversation with Artificial Intelligence
T2 - Aligning language models with human values
AU - Kasirzadeh, Atoosa
AU - Gabriel, Iason
N1 - Acknowledgements: We would like to thank Courtney Biles, Martin Chadwick, Julia Haas, Po-Sen Huang, Lisa Anne Hendricks, Geoffrey Irving, Sean Legassick, Donald Martin Jr, Jaylen Pittman, Laura Rimmel, Christopher Summerfield, Laura Weidinger and Johannes Welbl for contributions and feedback on this paper. Particular thanks is owed to Ben Hutchinson and Owain Evans who provided us with detailed comments and advice. Significant portions of this paper were written while Atoosa Kasirzadeh was at DeepMind.
PY - 2023/6
Y1 - 2023/6
N2 - Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.
AB - Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.
KW - artificial intelligence
KW - conversational agents
KW - ethics of language models
KW - language technologies
KW - large language models
KW - value alignment
U2 - 10.1007/s13347-023-00606-x
DO - 10.1007/s13347-023-00606-x
M3 - Article
AN - SCOPUS:85153512112
SN - 2210-5433
VL - 36
JO - Philosophy and Technology
JF - Philosophy and Technology
M1 - 27
ER -