NZILBB: Conference Contributions

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 16 of 16
  • ItemOpen Access
    Learning effects in multimodal perception with real and simulated faces
    (Australian Speech Science and Technology Association Inc., 2019) Keough M; Derrick, Donald; Taylor RC; Gick B; Calhoun S; Escudero P; Tabain M; Warren P
    We have all learned to associate real voices with animated faces since childhood. Researchers use this association, employing virtual faces in audiovisual speech perception tasks. However, we do not know if perceivers treat those virtual faces the same as real faces, or if instead integration of speech cues from new virtual faces must be learned at the time of contact. We test this possibility using speech information that perceivers have never had a chance to associate with simulated faces – aerotactile somatosensation. With human faces, silent bilabial articulations (“ba” and “pa”), accompanied by synchronous cutaneous airflow, shift perceptual bias towards “pa”. If visual-tactile integration is unaffected by the visual stimuli’s ecological origin, results with virtual faces should be similar. Contra previous reports [8], our results show perceivers do treat computer-generated faces and human faces in a similar fashion - visually aligned cutaneous airflow shifts perceptual bias towards “pa” equally well with virtual and real faces.
  • ItemOpen Access
    Aero-tactile integration in fricatives: Converting audio to air flow information for speech perception enhancement
    (ISCA, 2014) Derrick, Donald; O'Beirne, Greg A.; De Rybel T; Hay J; Li H; Meng HM; Ma B; Chng E; Xie L
    We follow up on research demonstrating that aero-tactile information can enhance or interfere with accurate auditory perception among uninformed and untrained perceivers [1, 2, 3]. We computationally extract aperiodic information from auditory recordings of speech, which represents turbulent air-flow produced from the lips [4, 5]. This extracted signal is used to drive a piezoelectric air-pump producing air-flow to the right temple simultaneous with presentation of auditory recordings. Using forced-choice experiments, we replicate previous results with stops, finding enhanced perception of /pa/ in /pa/ vs. /ba/ pairs, and /ta/ in /ta/ vs. /da/ pairs [1, 6, 2, 3]. We also found enhanced perception of /fa/ in /ba/ vs. /fa/ pairs, and /sha/ in /da/ vs. /sha/ pairs, demonstrating that air flow during fricative production contacting the skin can also enhance speech perception. The results show that aero-tactile information can be extracted from the audio signal and used to enhance speech perception of a large class of speech sounds found in many languages of the world.
  • ItemOpen Access
    Vowel identity conditions the time course of tone recognition
    (2013) Shaw JA; Tyler MD; Kasiopa B; Ma Y; Proctor M; Han C; Derrick, Donald; Burnham D
    Using eye-tracking in a visual world paradigm, we sought converging evidence for the time course of Mandarin Chinese tone recognition as predicted by the availability of information in f0 and past results from a gating experiment. Our results showed that tones 1 and 2 are recognized earlier than tone 4, followed by tone 3. With the exception of tone 2, which was recognized earlier than expected, our results are consistent with those found in gating. The speed of tone 2 recognition varied significantly across vowels in our study, part of a broader pattern whereby vowels systematically influenced the time course of tone recognition. Rising tones, tone 2 and tone 3, were recognized earliest when co-produced with /a/. The falling tone, tone 4, was recognized earliest when co-produced with /u/. Intrinsic f0 and spectral cues to tone are discussed as possible explanations for the vowel quality effect.
  • ItemOpen Access
    Duration of Blackfoot /s/: A comparison of assibilant, affricate, singleton, geminate and syllabic /s/ in Blackfoot
    (2006) Derrick, Donald
    A study comparing the duration of assibilant, affricate, singleton, geminate and syllabic /s/ from the citation speech of one speaker demonstrated significant differences in the duration of geminate /s/ (µ = 300 ms), syllabic /s/ (µ = 240 ms), singleton /s/ (µ = 155 ms), and affricate /s/ (µ = 130 ms). The results show the expected contrast between short and long /s/, and between inter-consonantal long /s/ and affricate /s/, lending support to the Blackfoot syllabic /s/ analysis in Derrick (2006). Length measurements also showed a significant symmetrical relationship between vowel adjacency and long /s/ duration, demonstrating an inverse relationship between amplitude and duration of Blackfoot /s/. The cross linguistic implications for sibilants are significant and further research with more participants, more languages and using natural speech, into the relationship between duration and intensity is indicated.
  • ItemOpen Access
    Aero-tactile integration in Mandarin
    (Australian Speech Science and Technology Association Inc., 2019) Derrick, Donald; Heyne M; O'Beirne, Greg A.; Hay J; Calhoun S; Escudero P; Tabain M; Warren P
    Previous research has shown that audio-aligned air puffs applied to the skin can enhance the perception of speech audio [12]. In this study, we applied dynamically varying air flow during two-way forced-choice identification of Mandarin words, comparing them to results of a study on English which showed perceptual enhancement for both stops and fricatives [6]. Two differences emerged: Psychometric testing identified the 80% accuracy signal-to-noise ratio for Mandarin words to be at - 1.1 dB SNR, compared to -9.0 for English nonsense syllables. In addition, in Mandarin, aero-tactile stimuli only enhanced classification of voiceless stops, whereas it enhanced classification of voiceless stops and fricatives in English. These differences may partially result from the interaction of high conditional acoustic entropy in Mandarin compared to English [24] and air flow – that is, the Mandarin syllables had to be played with more preserved acoustic information, weakening the potential effect of air flow.
  • ItemOpen Access
    Hearing, seeing, and feeling speech: A pilot EEG study
    (2018) Hansmann D; Theys C; Derrick, Donald; Hillman K
    A large number of EEG studies have shown that auditory-visual signals lead to a neurophysiological processing advantage compared to auditory-only signals. Behavioral speech perception studies have shown that tactile stimuli can also enhance auditory speech perception. This EEG study was designed to identify whether congruent auditory-tactile speech information leads to similar neurophysiological processing advantages as those shown in auditory-visual studies.
  • ItemOpen Access
    Audio-Visual-Tactile integration in speech perception.
    (2018) Derrick, Donald; Hansmann D; Haws Z; Theys C
    Behavioural audio-visual research has shown enhancement1 and interference2 in speech perception, as has behavioural audio-tactile research3. However, to date, we have not encountered any experimental behavioural research into tri-modal integration in speech perception. (But see Alcorn4 for clinical techniques incorporating both vision and touch.) Based on the relative influence of visual and aero-tactile stimuli, we expect cumulative effects of both, with the most influence from auditory information, then visual information1, and lastly airflow3. Here we present a two-way forced-choice study of tri-modal integration in speech perception, showing the effects of both congruent and incongruent stimuli on accurate identification of auditory speech-in-noise.
  • ItemOpen Access
    Recording and reproducing speech airflow outside the mouth
    (2015) Derrick, Donald; De Rybel T; Fiasson R
    Researchers present a system that produces artificial airflow with a similar intensity envelope to that produced during speech, as measured 1 cm away from the lips. They have made an airflow estimator system that does not interfere with speech production. The system consists of a ping pong ball mounted on a carbon fiber lever that flexes a thin, flat, carbon-fiber member to which two strain gauges are affixed on opposite sides. These strain gages are part of a sinusoidal-driven bridge, the output of which, after amplification, is an amplitude-modulated sine wave that is can be recorded using a standard XLR based pre-amplifier into a computer.
  • ItemOpen Access
    Listen with your skin: Aerotak speech perception enhancement system
    (ISCA, 2014) Derrick, Donald; De Rybel T; O'Beirne, Greg A.; Hay J; Li H; Meng HM; Ma B; Chng E; Xie L
    Here we introduce Aerotak: A system for audio analysis and perception enhancement that allows speech perceivers to listen with their skin. The current system extracts unvoiced portions of an audio signal representative of turbulent air-flow in speech. It stores the audio signal in the left channel of a stereo audio output, and the air flow signal is stored in the right channel. The stored audio is used to drive a conversion unit that splits the left audio channel into a headphone out (to both ears) and right channel air pump drive signal to a piezoelectric pump that is mounted to the headphones. We have shown, using two-way forced-choice experiments, that the system enhances perception of voiceless stops and voiceless fricatives in noise such that 1 out of every 4 such words that would otherwise be missed will be heard correctly. We are currently conducting experiments on word identification while listening to a short-story, and are completing a stand-alone version of the Aerotak that works with real-time audio and from an embedded system. The short-story research and real-time system will be complete for InterSpeech 2014.
  • ItemOpen Access
    Recording and reproducing speech airflow outside the mouth
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Derrick, Donald; De Rybel, T.; Fiasson, R.
  • ItemOpen Access
    Non-metallic ultrasound probe holder for co-collection and co-registration with EMA
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Derrick, Donald; Best, C.T.; Fiasson, R.
    Co-collection and co-registration of ultrasound images of the tongue and articulometry data requires the stabilization of the ultrasound probe relative to the head using a non-metallic system. Audio, ultrasound, and articulometry data were recorded from 11 North American English speakers reading 10 blocks of 25 sentences, speaking for 2 minutes at a time, spanning a recording time of 45 minutes. The 95% confidence interval for ultrasound probe roll relative to head motion was 1.35 , and 2.12 mm for lateral displacement, such that ultrasound probe displacement is within acceptable rotational and translational parameters as described in the HOCUS paper [9]. The proper use of this probe holder could also allow for adequate ultrasound probe stabilization without external marker tracking for post-processing correction, making this probe holder suitable for field research
  • ItemOpen Access
    The influence of tongue position on trombone sound: A likely area of language influence
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Heyne, M.; Derrick, Donald
    This paper builds on initial evidence of First Language influence on brass playing presented in Heyne and Derrick (2013) [13] by indicating how tongue positioning might affect trombone timbre. Ultrasound imaging of the tongue was used to compare vowel production and sustained trombone notes for three participants, one each of New Zealand English, Tongan and Japanese, whose musical production was also analyzed acoustically. Comparison of the sound spectra produced by two semiprofessional players shows that the player using a higher, more retracted tongue position displays a larger component of high frequencies in the produced sound spectrum. We believe that this could explain why brass players can notice differences between players from different language backgrounds.
  • ItemOpen Access
    Examining speech production using masked priming
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2015) Davis, C.; Shaw, J.; Proctor, M.; Derrick, Donald; Sherwood, S.; Kim, J.
    The time to initiate naming a printed target word is reduced when preceded by an identical masked prime (match prime) or by one that has the same initial letter (onset prime) compared to an all letter different control. Masked priming has been examined using vocal response time but offers an opportunity to examine speech production dynamics before the onset of speech acoustics. We tracked tongue-dorsum, tongue-tip and lip motion from four participants pronouncing 19 targets in match, onset and unrelated control prime conditions. Control primes were selected so their articulation involved a different tongue gesture than the target. Prime influence was measured by tongue-dorsum height at gestural onset and peak velocity of the subsequent gesture. Results showed that relative to targets in the match condition, control targets had a significantly different tongue dorsum height and the peak velocity was greater when the subsequent gesture was achieved.
  • ItemOpen Access
    On the inter-dependence of tonal and vocalic production goals in Chinese
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2014) Shaw, J.; Wei-rong, C.; Proctor, M.I.; Derrick, Donald; Dakhoul, E.
    We studied tone-vowel coproduction using Electromagnetic Articulography (EMA). Fleshpoints on the tongue and jaw were tracked while native Chinese speakers (n = 6) produced three vowels, /a/, /i/, /u/, combined with four Chinese tones. We found differences in tongue position across tones for /a/ and for /i/ but not for /u/. The low and rising tones patterned together in conditioning lower tongue blade (TB) position for /a/ and a higher TB position for /i/. This pattern suggests a degree of inter-dependence between tonal and vocalic targets. The effect of tone on TB height was mediated by jaw movement such that, even as TB sensor position varied across tones, the Euclidean distance between TB and Jaw sensors within each vowel remained stable. Thus, for this set of Chinese vowels, there is a relational invariance between active articulators, tongue and jaw. When viewed in terms of this relation, vowel and tonal targets appear to be completely independent.
  • ItemOpen Access
    Some initial findings regarding language influence on playing brass instruments
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2014) Heyne, M.; Derrick, Donald
    This paper presents some initial findings regarding the influence of First Language on playing brass instruments. Using ultra- sound imaging of the tongue, vowel production and sustained trombone notes were compared for a New Zealand English and a Tongan speaker. It is suggested that, during trombone play- ing, the tongue shapes used by the Tongan participant pattern with the back vowel /o/ while those used by the New Zealand English player pattern with the centralized KIT vowel (/9/) and schwa (/@/). It is argued that these findings provide preliminary evidence of First Language influence on brass playing.
  • ItemOpen Access
    Coordination of tongue tip and body in place differences among English coronal obstruents
    (University of Canterbury. New Zealand Institute of Language, Brain & Behaviour, 2014) Derrick, Donald; Fiasson, R.; Best, C.T.
    Using electromagnetometry tracking of the tongue, Best et al. (2010, 2014) have demonstrated that Wubuy, an Australian language with four coronal stop places, shows significant differences in tongue tip vs. tongue body motion range and motion coordination contrasting apicals and laminals. Here we continue this line of inquiry with three coronal obstruents in English, the apical alveolar stop /d/ and alveo-palatal affricate /d? ?/ vs. the laminal dental fricative /ð/. The results show support for tongue tip/body motion range differences between /d/ and /ð/ across vowel contexts. They also showed a tongue tip/body motion coordination distinction between the apical /d/ and laminal /ð/, which was significant for /i/ and /u/ but not /a/ contexts. Results are consistent with the Wubuy findings (Best et al, 2010, 2014) despite the differences in the coronal obstruent contrasts of the two languages, suggesting an apical/laminal distinction in tongue tip/body coordination.