Arts: Conference Contributions

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 218
  • ItemOpen Access
    Variation and Instability in Dialect-Based Embedding Spaces
    (Association for Computational Linguistics, 2023) Dunn, Jonathan
    This paper measures variation in embedding spaces which have been trained on different regional varieties of English while controlling for instability in the embeddings. While previous work has shown that it is possible to distinguish between similar varieties of a language, this paper experiments with two follow-up questions: First, does the variety represented in the training data systematically influence the resulting embedding space after training? This paper shows that differences in embeddings across varieties are significantly higher than baseline instability. Second, is such dialectbased variation spread equally throughout the lexicon? This paper shows that specific parts of the lexicon are particularly subject to variation. Taken together, these experiments confirm that embedding spaces are significantly influenced by the dialect represented in the training data. This finding implies that there is semantic variation across dialects, in addition to previously studied lexical and syntactic variation.
  • ItemOpen Access
    Te whakaoho o te mōhiotanga huna
    (2022) Hay J; Keegan P; Mattingley W; Todd S; Panther F; King, Jeanette
  • ItemOpen Access
    Stability of Syntactic Dialect Classification Over Space and Time
    (2022) Wong S; Dunn, Jonathan
    This paper analyses the degree to which dialect classifiers based on syntactic representations remain stable over space and time. While previous work has shown that the combination of grammar induction and geospatial text classification produces robust dialect models, we do not know what influence both changing grammars and changing populations have on dialect models. This paper constructs a test set for 12 dialects of English that spans three years at monthly intervals with a fixed spatial distribution across 1,120 cities. Syntactic representations are formulated within the usage-based Construction Grammar paradigm (CxG). The decay rate of classification performance for each dialect over time allows us to identify regions undergoing syntactic change. And the distribution of classification accuracy within dialect regions allows us to identify the degree to which the grammar of a dialect is internally heterogeneous. The main contribution of this paper is to show that a rigorous evaluation of dialect classification models can be used to find both variation over space and change over time.
  • ItemOpen Access
    Towards a theory of motivation: describing commitment to the Māori language
    (2009) King, Jeanette; Gully, Nichol Catherine
  • ItemOpen Access
    Greenbeard Theory, Meet Simulation Theory
    (2020) Campbell, Douglas
  • ItemOpen Access
    Robots in Nozickland
    (2020) Campbell, Douglas
  • ItemOpen Access
  • ItemOpen Access
    Unsupervised morphological segmentation in a language with reduplication
    (2022) Todd S; Huang A; Needle J; King J; Hay, Jennifer
    We present an extension of the Morfessor Base line model of unsupervised morphological seg mentation (Creutz and Lagus, 2007) that in corporates abstract templates for reduplication, a typologically common but computationally underaddressed process. Through a detailed in vestigation that applies the model to Maori, the ¯ Indigenous language of Aotearoa New Zealand, we show that incorporating templates improves Morfessor’s ability to identify instances of redu plication, and does so most when there are multiple minimally-overlapping templates. We present an error analysis that reveals important factors to consider when applying the extended model and suggests useful future directions.
  • ItemOpen Access
    Uninvited Campaign Rally: Effects of Hong Kong’s Anti-Extradition Movement on Taiwan’s 2020 Election
    (American Political Science Association, 2021) Huang C; Tan, Alex
    Party, candidate, and issue are undoubtedly the most frequently cited elements in electoral studies. All three, especially party system and issue debates, often reflect and cut along the main social and political cleavages in a society. However, the classic Michigan model and social cleavage theory may overlook the influence of events beyond the country’s border. It is curious that recent literature began to recognize subtle foreign intervention through internet and social media, yet few pay enough attention to the possible effects of intensively reported external events on domestic politics and their interactions. This study fills this void by studying an Asian new democracy and examining how events hundreds of miles away can send shock waves to impact, if not to reverse, the domestic public mood. We examine the effects of Hong Kong’s anti-extradition movement in 2019 on Taiwan voters’ views of cross-strait relationship, especially the stands on Taiwan independence vs. unification with China. We utilize the unique face-to-face survey panel data collected by the Taiwan Institute for Governance and Communication Research (TIGCR) at the National Chengchi University from 2018 to 2020 (TIGCR-PPS 2018, 2019 & 2020) to measure the stability and change of independence-unification views in Taiwan during the 2019 campaign period. We find that the shift of general public’s attitude in this long-existing political cleavage on cross-strait relations indeed accounts for Taiwan’s 2020 presidential election results.
  • ItemOpen Access
    Learned Construction Grammars Converge Across Registers Given Increased Exposure
    (Association for Computational Linguistics, 2021) Tayyar Madabushi H; Dunn, Jonathan
    This paper measures the impact of increased exposure on whether learned construction grammars converge onto shared representations when trained on data from different registers. Register influences the frequency of constructions, with some structures common in formal but not informal usage. We expect that a grammar induction algorithm exposed to different registers will acquire different constructions. To what degree does increased exposure lead to the convergence of register-specific grammars? The experiments in this paper simulate language learning in 12 languages (half Germanic and half Romance) with corpora representing three registers (Twitter, Wikipedia, Web). These simulations are repeated with increasing amounts of exposure, from 100k to 2 million words, to measure the impact of exposure on the convergence of grammars. The results show that increased exposure does lead to converging grammars across all languages. In addition, a shared core of register-universal constructions remains constant across increasing amounts of exposure.
  • ItemOpen Access
    Virtues, vices and place attachment
    (2021) Mason, Carolyn
    There is a virtue associated with forming and maintaining relationships to places. This virtue has not been recognised by philosophers, but it plays a role in indigenous cultures across the world. Hence, place attachment is one of many areas in which indigenous knowledge can contribute to the development of Western philosophy. After explaining what it means for a disposition to act in accordance with this virtue to be a Neo-Aristotelian virtue, examples from Māori culture are used to explain why the way that people form relationships to places can be a virtue in this neo-Aristotelian sense. Recognising this virtue reveals ways of interacting with the world that contribute to human and environmental flourishing, as well as revealing a new way in which indigenous people are harmed when dispossessed of their ancestral land.
  • ItemOpen Access
    Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction
    (Association for Computational Linguistics, 2021) Nini A; Dunn, Jonathan
    This paper asks whether a distinction between production-based and perception-based grammar induction influences either (i) the growth curve of grammars and lexicons or (ii) the similarity between representations learned from independent sub-sets of a corpus. A production based model is trained on the usage of a single individual, thus simulating the grammatical knowledge of a single speaker. A perception-based model is trained on an aggregation of many individuals, thus simulating grammatical generalizations learned from exposure to many different speakers. To ensure robustness, the experiments are replicated across two registers of written English, with four additional registers reserved as a control. A set of three computational experiments shows that production-based grammars are significantly different from perception-based grammars across all conditions, with a steeper growth curve that can be explained by substantial inter-individual grammatical differences.
  • ItemOpen Access
    Copyright and Post-disaster Archiving
    (2019) Thomson C; Millar, Paul
    In this workshop session Paul Millar delivers a presentation, jointly prepared with Dr Chris Thomson, which discusses the experience of the CEISMIC project in dealing with copyright issues. He outlines the relevant law as it stands and is applied in New Zealand, and discusses some of the unique situations the project encountered as the team tried to ensure openness, inclusivity and rigour within a copyright compliant framework during a period of trauma and transition.
  • ItemOpen Access
    Representations of Language Varieties Are Reliable Given Corpus Similarity Measures
    (Association for Computational Linguistics, 2021) Dunn, Jonathan
    This paper measures similarity both within and between 84 language varieties across nine languages. These corpora are drawn from digital sources (the web and tweets), allowing us to evaluate whether such geo-referenced corpora are reliable for modelling linguistic variation. The basic idea is that, if each source adequately represents a single underlying language variety, then the similarity between these sources should be stable across all languages and countries. The paper shows that there is a consistent agreement between these sources using frequency-based corpus similarity measures. This provides further evidence that digital geo-referenced corpora consistently represent local language varieties.