Measuring Linguistic Diversity During COVID-19 (2020)
Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-indifferences method based on the Herﬁndahl Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where signiﬁcant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them.
CitationDunn J, Coupe T, Adams B (2020). Measuring Linguistic Diversity During COVID-19. The Fourth Workshop on Natural Language Processing and Computational Social Science. 19/11/2020-20/11/2020. Proceedings of The Fourth Workshop on the Fourth Workshop on Natural Language Processing and Computational Social Science.
This citation is automatically generated and may be unreliable. Use as a guide only.
ANZSRC Fields of Research47 - Language, communication and culture::4704 - Linguistics::470404 - Corpus linguistics
47 - Language, communication and culture::4704 - Linguistics::470403 - Computational linguistics
47 - Language, communication and culture::4704 - Linguistics::470411 - Sociolinguistics
RightsAll rights reserved unless otherwise stated
Showing items related by title, author, creator and subject.
Tayyar Madabushi H; Dunn, Jonathan (Association for Computational Linguistics, 2021)This paper measures the impact of increased exposure on whether learned construction grammars converge onto shared representations when trained on data from different registers. Register influences the frequency of ...
Li H; Nini A; Dunn, Jonathan (Walter de Gruyter GmbH, 2022)This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is ...
Nini A; Dunn, Jonathan (Association for Computational Linguistics, 2021)This paper asks whether a distinction between production-based and perception-based grammar induction influences either (i) the growth curve of grammars and lexicons or (ii) the similarity between representations learned ...