Abstract
Sociolinguistics is often concerned with how
variants of a linguistic item (e.g., nothing vs.
nothin’) are used by different groups or in
different situations. We introduce the task
of inducing lexical variables from code-mixed
text: that is, identifying equivalence pairs such
as (football, fitba) along with their linguistic
code (football→British, fitba→Scottish).
We adapt a framework for identifying gender-biased
word pairs to this new task, and present
results on three different pairs of English dialects,
using tweets as the code-mixed text.
Our system achieves precision of over 70%
for two of these three datasets, and produces
useful results even without extensive parameter
tuning. Our success in adapting this framework
from gender to language variety suggests
that it could be used to discover other types of
analogous pairs as well.
Original language | English |
---|---|
Title of host publication | 2018 The 4th Workshop on Noisy User-generated Text (W-NUT) |
Subtitle of host publication | Nov 1, 2018, Brussels, Belgium (at EMNLP 2018) |
Place of Publication | Brussels, Belgium |
Publisher | Association for Computational Linguistics |
Pages | 1-6 |
Number of pages | 6 |
Publication status | Published - Nov 2018 |
Event | 4th Workshop on Noisy User-generated Text (W-NUT): At EMNLP 2018 - Brussels, Belgium Duration: 1 Nov 2018 → 1 Nov 2018 http://noisy-text.github.io/2018/ |
Workshop
Workshop | 4th Workshop on Noisy User-generated Text (W-NUT) |
---|---|
Abbreviated title | W-NUT 2018 |
Country/Territory | Belgium |
City | Brussels |
Period | 1/11/18 → 1/11/18 |
Internet address |