NZILBB: Journal Articles

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 18 of 18
  • ItemOpen Access
    Speech air flow with and without face masks
    (Springer Science and Business Media LLC, 2022) Kabaliuk N; Longworth L; Pishyar-Dehkordi P; Jermy M; Derrick, Donald
    Face masks slow exhaled air flow and sequester exhaled particles. There are many types of face masks on the market today, each having widely varying fits, filtering, and air redirection characteristics. While particle filtration and flow resistance from masks has been well studied, their effects on speech air flow has not. We built a schlieren system and recorded speech air flow with 14 different face masks, comparing it to mask-less speech. All of the face masks reduced air flow from speech, but some allowed air flow features to reach further than 40 cm from a speaker’s lips and nose within a few seconds, and all the face masks allowed some air to escape above the nose. Evidence from available literature shows that distancing and ventilation in higher-risk indoor environment provide more benefit than wearing a face mask. Our own research shows all the masks we tested provide some additional benefit of restricting air flow from a speaker. However, well-fitted mask specifically designed for the purpose of preventing the spread of disease reduce air flow the most. Future research will study the effects of face masks on speech communication in order to facilitate cost/benefit analysis of mask usage in various environments.
  • ItemOpen Access
    Gait change in tongue movement
    (Springer Science and Business Media LLC, 2021) Gick B; Derrick, Donald
    During locomotion, humans switch gaits from walking to running, and horses from walking to trotting to cantering to galloping, as they increase their movement rate. It is unknown whether gait change leading to a wider movement rate range is limited to locomotive-type behaviours, or instead is a general property of any rate-varying motor system. The tongue during speech provides a motor system that can address this gap. In controlled speech experiments, using phrases containing complex tongue-movement sequences, we demonstrate distinct gaits in tongue movement at different speech rates. As speakers widen their tongue-front displacement range, they gain access to wider speech-rate ranges. At the widest displacement ranges, speakers also produce categorically different patterns for their slowest and fastest speech. Speakers with the narrowest tongue-front displacement ranges show one stable speech-gait pattern, and speakers with widest ranges show two. Critical fluctuation analysis of tongue motion over the time-course of speech revealed these speakers used greater effort at the beginning of phrases—such end-state-comfort effects indicate speech planning. Based on these findings, we expect that categorical motion solutions may emerge in any motor system, providing that system with access to wider movement-rate ranges.
  • ItemOpen Access
    Characteristics of air puffs produced in English 'pa': Experiments and simulations
    (Acoustical Society of America (ASA), 2009) Derrick, Donald; Anderson P; Gick B; Green S
    Three dimensional large eddy simulations, microphone "pop" measurements, and high-speed videos of the airflow and lip opening associated with the syllable "pa" are presented. In the simulations, the mouth is represented by a narrow static ellipse with a back pressure dropping to 110th of its initial value within 60 ms of the release. The simulations show a jet penetration rate that falls within range of the pressure front of microphone pop. The simulations and high-speed video experiments were within 20% agreement after 40 ms, with the video experiments showing a slower penetration rate than the simulations during the first 40 ms. Kinematic measurements indicate that rapid changes in lip geometry during the first 40 ms underlie this discrepancy. These findings will be useful for microphone manufacturers, sound engineers, and researchers in speech aerodynamics modeling and articulatory speech synthesis.
  • ItemOpen Access
    Biomechanical modeling of English /r/ variants
    (Acoustical Society of America (ASA), 2012) Stavness I; Gick B; Derrick, Donald; Fels S
    This study reports an investigation of the well-known context-dependent variation in English /r/ using a biomechanical tongue-jaw-hyoid model. The simulation results show that preferred /r/ variants require less volume displacement, relative strain, and relative muscle stress than variants that are not preferred. This study also uncovers a previously unknown mechanism in tongue biomechanics for /r/ production: Torque in the sagittal plane about the mental spine. This torque enables raising of the tongue anterior for retroflexed [ɻ] by activation of hyoglossus and relaxation of anterior genioglossus. The results provide a deeper understanding of the articulatory factors that govern contextual phonetic variation.
  • ItemOpen Access
    The temporal window of audio-tactile integration in speech perception
    (Acoustical Society of America (ASA), 2010) Gick B; Ikegami Y; Derrick, Donald
    Asynchronous cross-modal information is integrated asymmetrically in audio-visual perception. To test whether this asymmetry generalizes across modalities, auditory (aspirated "pa" and unaspirated "ba" stops) and tactile (slight, inaudible, cutaneous air puffs) signals were presented synchronously and asynchronously. Results were similar to previous AV studies: the temporal window of integration for the enhancement effect (but not the interference effect) was asymmetrical, allowing up to 200 ms of asynchrony when the puff followed the audio signal, but only up to 50 ms when the puff preceded the audio signal. These findings suggest that perceivers accommodate differences in physical transmission speed of different multimodal signals. © 2010 Acoustical Society of America.
  • ItemOpen Access
    Locating de-lateralization in the pathway of sound changes affecting coda /l/
    (Ubiquity Press, Ltd., 2020) Strycharczuk P; Derrick, Donald; Shaw J
    ‘Vocalization’ is a label commonly used to describe an ongoing change in progress affecting coda /l/ in multiple accents of English. The label is directly linked to the loss of consonantal constriction observed in this process, but it also implicitly signals a specific type of change affecting manner of articulation from consonant to vowel, which involves loss of tongue lateralization, the defining property of lateral sounds. In this study, we consider two potential diachronic pathways of change: an abrupt loss of lateralization which follows from the loss of apical constriction, versus slower gradual loss of lateralization that tracks the articulatory changes to the dorsal component of /l/. We present articulatory data from seven speakers of New Zealand English, acquired using a combination of midsagittal and lateral EMA, as well as midsagittal ultrasound. Different stages of sound change are reconstructed through synchronic variation between light, dark, and vocalized /l/, induced by systematic manipulation of the segmental and morphosyntactic environment, and complemented by comparison of different individual articulatory strategies. Our data show a systematic reduction in lateralization that is conditioned by increasing degrees of /l/-darkening and /l/-vocalization. This observation supports the idea of a gradual diachronic shift and the following pathway of change: /l/-darkening, driven by the dorsal gesture, precipitates some loss of lateralization, which is followed by loss of the apical gesture. This pathway indicates that loss of lateralization is an integral component in the changes in manner of articulation of /l/ from consonantal to vocalic.
  • ItemOpen Access
    The impacts of a community-based health education and nutritional support program on birth outcomes among migrant workers in maesot, thailand: A retrospective review
    (2020) Blue WJ; Derrick, Donald; Blue CL
    Since 2014, The Charis Project and later Shade Tree Foundation have been conducting community-based interventions focusing on nutritional support and education for pregnant and nursing mothers to alleviate immediate human security threats and encourage family stability. The purpose of this paper is to analyze the effectiveness of the first iteration of interventions from 2014 to 2017 in meeting the objectives of increased human security and family stability.
  • ItemOpen Access
    Phonological contrast and phonetic variation: The case of velars in Iwaidja
    (Project Muse, 2020) Shaw JA; Carignan C; Agostini TG; Mailhammer R; Harvey M; Derrick, Donald
    A field-based ultrasound and acoustic study of Iwaidja, an endangered Australian Aboriginal language, investigates the phonetic identity of nonnasal velar consonants in intervocalic position, where past work has proposed a [+continuant] vs. [−continuant] phonemic contrast. We analyze the putative contrast within a continuous phonetic space, defined by both acoustic and articulatory parameters, and find gradient variation: from more consonantal realizations, such as [ɰ], to more vocalic realizations, such as [a]. The distribution of realizations across lexical items and speakers does not support the proposed phonemic contrast. This case illustrates how lenition that is both phonetically gradient and variable across speakers and words can give the illusion of a contextually restricted phonemic contrast.
  • ItemOpen Access
    Mask-less oral and nasal audio recording and air flow estimation for speech analysis
    (Institution of Engineering and Technology (IET), 2019) Derrick, Donald; Duerr J; Kerr RG
    Here is demonstrated Rivener, a mask-less oral and nasal audio recorder and air flow estimation system. This system records audio and low-frequency pseudo-sound from the nares and mouth. The system does not interfere with speech intelligibility, and minimally interferes with visual observation of the speaker. From these recordings, nasalance (a ratio of oral and nasal sound energy), oral air flow, and nasal air flow patterns may be estimated, all while allowing effective clinical observation. The first demonstration is a case-study comparison of the difference between hearing-impaired (HI) speech and non-impaired (NI) speech. Rivener records standard features of HI speech such as: (i) Atypically high or low speech amplitude; (ii) fundamental frequency (pitch) making individual words into intonational phrases; (iii) speech segment substitution; (iv) hypernasa-lance; (v) atypical air flow in HI speech, including low air flow during plosive release. The second demonstration is a comparison of Rivener and the Rothenberg NAS-1's ability to record nasalance among 26 New Zealand English speakers. The NAS-1 can differentiate low, medium, and high nasalance passages, whereas Rivener differentiates only medium and high nasalance passages consistently.
  • ItemOpen Access
    Native language influence on brass instrument performance: An application of generalized additive mixed models (GAMMs) to midsagittal ultrasound images of the tongue
    (2019) Derrick, Donald; Heyne M; Al-Tamimi J
    This paper presents the findings of an ultrasound study of 10 New Zealand English and 10 Tongan-speaking trombone players, to determine whether there is an influence of native language speech production on trombone performance. Trombone players’ midsagittal tongue shapes were recorded while reading wordlists and during sustained note productions, and tongue surface contours traced. After normalizing to account for differences in vocal tract shape and ultrasound transducer orientation, we used generalized additive mixed models (GAMMs) to estimate average tongue surface shapes used by the players from the two language groups when producing notes at different pitches and intensities, and during the production of the monophthongs in their native languages. The average midsagittal tongue contours predicted by our models show a statistically robust difference at the back of the tongue distinguishing the two groups, where the New Zealand English players display an overall more retracted tongue position; however, tongue shape during playing does not directly map onto vowel tongue shapes as prescribed by the pedagogical literature. While the New Zealand Englishspeaking participants employed a playing tongue shape approximating schwa and the vowel used in the word ‘lot,’ the Tongan participants used a tongue shape loosely patterning with the back vowels /o/ and /u/. We argue that these findings represent evidence for native language influence on brass instrument performance; however, this influence seems to be secondary to more basic constraints of brass playing related to airflow requirements and acoustical considerations, with the vocal tract configurations observed across both groups satisfying these conditions in different ways. Our findings furthermore provide evidence for the functional independence of various sections of the tongue and indicate that speech production, itself an acquired motor skill, can influence another skilled behavior via motor memory of vocal tract gestures forming the basis of local optimization processes to arrive at a suitable tongue shape for sustained note production.
  • ItemOpen Access
    Tri-modal Speech: Audio-visual-tactile Integration in Speech Perception
    (2019) Derrick, Donald; Hansmann D; Theys C
    Speech perception is a multi-sensory experience. Visual information enhances [Sumby and Pollack (1954). J. Acoust. Soc. Am. 25, 212–215] and interferes [McGurk and MacDonald (1976). Nature 264, 746–748] with speech perception. Similarly, tactile information, transmitted by puffs of air arriving at the skin and aligned with speech audio, alters [Gick and Derrick (2009). Nature 462, 502–504] auditory speech perception in noise. It has also been shown that aero-tactile information influences visual speech perception when an auditory signal is absent [Derrick, Bicevskis, and Gick (2019a). Front. Commun. Lang. Sci. 3(61), 1–11]. However, researchers have not yet identified the combined influence of aero-tactile, visual, and auditory information on speech perception. The effects of matching and mismatching visual and tactile speech on two-way forced-choice auditory syllable-in-noise classification tasks were tested. The results showed that both visual and tactile information altered the signal-to-noise threshold for accurate identification of auditory signals. Similar to previous studies, the visual component has a strong influence on auditory syllable-in-noise identification, as evidenced by a 28.04dB improvement in SNR between matching and mismatching visual stimulus presentations. In comparison, the tactile component had a small influence resulting in a 1.58dB SNR match-mismatch range. The effects of both the audio and tactile information were shown to be additive.
  • ItemOpen Access
    Visual-Tactile Speech Perception and the Autism Quotient
    (Frontiers Media SA, 2019) Derrick, Donald; Bicevskis K; Gick B
    Multisensory information is integrated asymmetrically in speech perception: An audio signal can follow video by 240ms, but can precede video by only 60ms, without disrupting the sense of synchronicity (Munhall et al., 1996). Similarly, air flow can follow either audio (Gick et al., 2010) or video (Bicevskis et al., 2016) by a much larger margin than it can precede either while remaining perceptually synchronous. These asymmetric windows of integration have been attributed to the physical properties of the signals; light travels faster than sound (Munhall et al., 1996), and sound travels faster than air flow (Gick et al., 2010). Perceptual windows of integration narrow during development (Hillock-Dunn and Wallace, 2012), but remain wider among people with autism (Wallace and Stevenson, 2014). Here we show that, even among neurotypical adult perceivers, visual-tactile windows of integration are wider and flatter the higher the participant’s Autism Quotient (AQ) (Baron-Cohen et al., 2001), a self-report measure of autistic traits. As “pa” is produced with a tiny burst of aspiration (Derrick et al., 2009), we applied light and inaudible air puffs to participants’ necks while they watched silent videos of a person saying “ba” or “pa,” with puffs presented both synchronously and at varying degrees of asynchrony relative to the recorded plosive release burst, which itself is time-aligned to visible lip opening. All syllables seen along with cutaneous air puffs were more likely to be perceived as “pa.” Syllables were perceived as “pa” most often when the air puff occurred 50–100ms after lip opening, with decaying probability as asynchrony increased. Integration was less dependent on time-alignment the higher the participant’s AQ. Perceivers integrate event-relevant tactile information in visual speech perception with greater reliance upon event-related accuracy the more they self-describe as neurotypical, supporting the Happé and Frith (2006) weak coherence account of autism spectrum disorder (ASD).
  • ItemOpen Access
    Visual-tactile integration in speech perception : evidence for modality neutral speech primitives.
    (Acoustical Society of America (ASA), 2016) Bicevskis K; Derrick, Donald; Gick B
    © 2016 Acoustical Society of America. Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal - consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.
  • ItemOpen Access
    Aero-tactile integration during speech perception: Effect of response and stimulus characteristics on syllable identification
    (Acoustical Society of America (ASA), 2019) Derrick, Donald; Madappallimattam J; Theys C
    Integration of auditory and aero-tactile information during speech perception has been documented during two-way closed-choice syllable classification tasks [Gick and Derrick (2009). Nature 462, 502–504], but not during an open-choice task using continuous speech perception [Derrick, O’Beirne, Gorden, De Rybel, Fiasson, and Hay (2016). J. Acoust. Soc. Am. 140(4), 3225]. This study was designed to compare audio-tactile integration during open-choice perception of individual syllables. In addition, this study aimed to compare the effects of place and manner of articulation. Thirty-four untrained participants identified syllables in both auditory-only and audio-tactile conditions in an open-choice paradigm. In addition, forty participants performed a closed-choice perception experiment to allow direct comparison between these two response-type paradigms. Adaptive staircases, as noted by Watson [(1983). Percept. Psychophys. 33(2), 113–120] were used to identify the signal-to-noise ratio for identification accuracy thresholds. The results showed no significant effect of air flow on syllable identification accuracy during the open-choice task, but found a bias towards voiceless identification of labials, and towards voiced identification of velars. Comparison of the open-choice results to those of the closed-choice task show a significant difference between both response types, with audio-tactile integration shown in the closed-choice task, but not in the open-choice task. These results suggest that aero-tactile enhancement of speech perception is dependent on response type demands. copyright 2019 Acoustical Society of America
  • ItemOpen Access
    Three-dimensional printable ultrasound transducer stabilization system
    (2018) Derrick, Donald; Carignan C; Chen W-R; Shujau M; Best CT
    When using ultrasound imaging of the tongue for speech recording/research, submental transducer stabilization is required to prevent the ultrasound transducer from translating or rotating in relation to the tongue. An iterative prototype of a lightweight three-dimensional-printable wearable ultrasound transducer stabilization system that allows flexible jaw motion and free head movement is presented. The system is completely non-metallic, eliminating interference with co-recorded signals, thus permitting co-collection and co-registration with articulometry systems. A motion study of the final version demonstrates that transducer rotation is limited to 1.25 and translation to 2.5 mm— well within accepted tolerances.
  • ItemOpen Access
    Using a radial ultrasound probe's virtual origin to compute midsagittal smoothing splines in polar coordinates
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Heyne, M.; Derrick, Donald
    Tongue surface measurements from midsagittal ultrasound scans are effectively arcs with deviations representing tongue shape, but smoothing-spline analysis of variances (SSANOVAs) assume variance around a horizontal line. Therefore, calculating SSANOVA average curves of tongue traces in Cartesian Coordinates [Davidson, J. Acoust. Soc. Am. 120(1), 407–415 (2006)] creates errors that are compounded at tongue tip and root where average tongue shape deviates most from a horizontal line. This paper introduces a method for transforming data into polar coordinates similar to the technique by Mielke [J. Acoust. Soc. Am. 137(5), 2858–2869 (2015)], but using the virtual origin of a radial ultrasound transducer as the polar origin—allowing data conversion in a manner that is robust against between-subject and between-session variability.
  • ItemOpen Access
    Super-imposing maxillary and palatal locations for electroarticulometry: A SIMPLE method
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Chen, W.; Chang, Y.; Best, C.T.; Derrick, Donald
    This study proposes a method of superimposing a physical palatal profile, extracted from a speaker’s maxillary impression, onto real-time mid-sagittal articulatory data. A palatal/dental profile is first obtained by three-dimensional–scanning the maxillary impression of the speaker. Then a high resolution mid-sagittal palatal line, extracted from the profile, is sub-divided into articulatory zones and superimposed, by Iterative Closest Point algorithm, onto reconstructed palatal traces in electromagnetic articulometric (EMA) data. Evaluations were carried out by comparing consonant targets elicited by EMA with the proposed method and by static palatography. The proposed method yields accurate results, as supported by palatography.
  • ItemOpen Access
    Three speech sounds, one motor action: Evidence for speech-motor disparity from English flap production
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Derrick, Donald; Stavness, I.; Gick, B.
    The assumption that units of speech production bear a one-to-one relationship to speech motor actions pervades otherwise widely varying theories of speech motor behavior. This speech produc- tion and simulation study demonstrates that commonly occurring flap sequences may violate this assumption. In the word “Saturday,” a sequence of three sounds may be produced using a single, cyclic motor action. Under this view, the initial upward tongue tip motion, starting with the first vowel and moving to contact the hard palate on the way to a retroflex position, is under active muscular control, while the downward movement of the tongue tip, including the second contact with the hard palate, results from gravity and elasticity during tongue muscle relaxation. This sequence is reproduced using a three-dimensional computer simulation of human vocal tract biomechanics and differs greatly from other observed sequences for the same word, which employ multiple targeted speech motor actions. This outcome suggests that a goal of a speaker is to produce an entire sequence in a biomechanically efficient way at the expense of maintaining parity within the individual parts of the sequence.