(L)-vocalization has been receiving increasing attention in sociophonetic research but is a challenging variable to measure consistently. Acoustic measures are not typically used because velarized-(L), which is the realization most likely to vocalize, is itself extremely difficult to distinguish from a back rounded vowel based only on acoustic features. Because of this, as well as the difficulty in using articulatory measures to capture spontaneous, field-based data, sociolinguists have typically relied solely on auditory coding measures. However, the level of consistency across coders is an issue of particular methodological concern when employing auditory coding, both within and across studies. The current paper presents results from a multi-listener perception survey of (L)-vocalization coding. Phonetically and sociolinguistically trained listeners evaluated a range of productions from two ethnically diverse U.S. English communities: Columbus, Ohio, and San Francisco, California. The survey investigates inter-coder consistency with respect to both phonetic environment and speech variety, with results showing that reliability is dependent on both factors. Inter-coder disagreement is also highest for tokens rated at intermediate levels of vocalization. Given our ethnically diverse speaker sample, we further ask how the coder's perception of a speaker's ethnicity interacts with their vocalization coding decisions. Our findings bear on the methodological decisions made in research that relies on auditory coding, drawing particular attention to the challenge of designing a method sensitive to patterns of variability and social meaning that are potentially both universal and community-specific.