University of Canterbury Home
    • Admin
    UC Research Repository
    UC Library
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    1. UC Home
    2. Library
    3. UC Research Repository
    4. Faculty of Arts | Te Kaupeka Toi Tangata
    5. Arts: Journal Articles
    6. View Item
    1. UC Home
    2.  > 
    3. Library
    4.  > 
    5. UC Research Repository
    6.  > 
    7. Faculty of Arts | Te Kaupeka Toi Tangata
    8.  > 
    9. Arts: Journal Articles
    10.  > 
    11. View Item

    Mapping Languages and Demographics with Georeferenced Corpora (2019)

    Thumbnail
    View/Open
    Accepted version (922.9Kb)
    Type of Content
    Conference Contributions - Published
    UC Permalink
    http://hdl.handle.net/10092/17132
    
    Collections
    • Arts: Journal Articles [314]
    Authors
    Dunn, Jonathan
    Adams, Ben
    show all
    Abstract

    This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts of the world the datasets are most representative of actual populations; and (iii) how to weight the datasets to provide more accurate representations of underlying populations. The paper finds that the two datasets represent very different populations and that they correlate with actual populations with values of r = 0.60 (social media) and r = 0.49 (web-crawled). Further, Twitter data makes better predictions about the inventory of languages used in each country.

    Citation
    Dunn J, Adams B (2019). Mapping Languages and Demographics with Georeferenced Corpora. Proceedings of Geocomputation 2019.
    This citation is automatically generated and may be unreliable. Use as a guide only.
    Keywords
    user-generated content; crowdsourcing; language; demographics; population
    ANZSRC Fields of Research
    47 - Language, communication and culture::4704 - Linguistics::470406 - Historical, comparative and typological linguistics
    20 - Language, Communication and Culture::2004 - Linguistics::200402 - Computational Linguistics
    16 - Studies in Human Society::1603 - Demography::160399 - Demography not elsewhere classified
    16 - Studies in Human Society::1604 - Human Geography::160403 - Social and Cultural Geography

    Related items

    Showing items related by title, author, creator and subject.

    • The Effect of Area Level Deprivation on Obesity in New Zealand: Analysis of The New Zealand Health Surveys 

      Kirk RC; Halim, A; Basu, A (2017)
    • Diversifying and Moving Through the Hidden City: A commentary on Heather Hayward's 'Hidden City' 

      Dombroski KF; Diprose G (2016)
    • Phonetic fieldwork in southern New Guinea 

      Lindsey, Kate L.; Schokkin, Dineke (University of Hawai'i Press, 2021)
      This special publication of Language Documentation & Conservation represents a collection of the first available phonetic descriptions of several languages of Southern New Guinea. This area encompasses the southernmost ...
    Advanced Search

    Browse

    All of the RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThesis DisciplineThis CollectionBy Issue DateAuthorsTitlesSubjectsThesis Discipline

    Statistics

    View Usage Statistics
    • SUBMISSIONS
    • Research Outputs
    • UC Theses
    • CONTACTS
    • Send Feedback
    • +64 3 369 3853
    • ucresearchrepository@canterbury.ac.nz
    • ABOUT
    • UC Research Repository Guide
    • Copyright and Disclaimer
    • SUBMISSIONS
    • Research Outputs
    • UC Theses
    • CONTACTS
    • Send Feedback
    • +64 3 369 3853
    • ucresearchrepository@canterbury.ac.nz
    • ABOUT
    • UC Research Repository Guide
    • Copyright and Disclaimer