The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python

Antonio Valerio Miceli-Barone, Fazl Barez, Ioannis Konstas, Shay B. Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Large Language Models (LLMs) have successfully been applied to code generation tasks, raising the question of how well these models understand programming. Typical programming languages have invariances and equivariances in their semantics that human programmers intuitively understand and exploit, such as the (near) invariance to the renaming of identifiers. We show that LLMs not only fail to properly generate correct Python code when default function names are swapped, but some of them even become more confident in their incorrect predictions as the model size increases, an instance of the recently discovered phenomenon of Inverse Scaling, which runs contrary to the commonly observed trend of increasing prediction quality with increasing model size. Our findings indicate that, despite their astonishing typical-case performance, LLMs still lack a deep, abstract understanding of the content they manipulate, making them unsuitable for tasks that statistically deviate from their training data, and that mere scaling is not enough to achieve such capability.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL 2023
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Pages272-292
Number of pages19
ISBN (Electronic)9781959429623
Publication statusPublished - 9 Jul 2023
Event61st Annual Meeting of the Association for Computational Linguistics - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023
Conference number: 61
https://2023.aclweb.org/

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics
Abbreviated titleACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23
Internet address

Fingerprint

Dive into the research topics of 'The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python'. Together they form a unique fingerprint.

Cite this