Refinement and Normalisation of the University of Canterbury Auditory-Visual Matrix Sentence Test
Degree GrantorUniversity of Canterbury
Degree NameMaster of Audiology
Developed by O＇Beirne and Trounson (Trounson, 2012), the UC Auditory-Visual Matrix Sentence Test (UCAMST) is an auditory-visual speech test in NZ English where sentences are assembled from 50 words arranged into 5 columns (name, verb, quantity, adjective, object). Generation of sentence materials involved cutting and re-assembling 100 naturally spoken ‟original” sentences to create a large repertoire of 100,000 unique ‟synthesised” sentences.
The process of synthesising sentences from video fragments resulted in occasional artifactual image jerks (‟judders”)‒quantified by an unusually large change in the ‟pixel difference value” of consecutive frames‒at the edited transitions between video fragments. To preserve the naturalness of materials, Study 1 aimed to select transitions with the least ‟noticeable” judders.
Normal-hearing participants (n = 18) assigned a 10-point noticeability rating score to 100 sentences comprising unedited ‟no judder” sentences (n = 28), and ‟synthesised” sentences (n = 72) that varied in the severity (i.e. pixel difference value), number, and position of judders. The judders were found to be significantly noticeable compared to no judder controls, and based on mean rating score, 2,494 sentences with ‟minimal noticeable judder” were included in the auditory-visual UCAMST. Follow-on work should establish equivalent lists using these sentences. The average pixel difference value was found to be a significant predictor of rating score, therefore may be used as a guide in future development of auditory-visual speech tests assembled from video fragments.
The aim of Study 2 was to normalise the auditory-alone UCAMST to make each audio fragment equally intelligible in noise. In Part I, individuals with normal hearing (n = 17) assessed 400 sentences containing each file fragment presented at four different SNRs (-18.5, -15, -11.5, and -8 dB) in both constant speech-shaped noise (n = 9) and six-talker babble (n = 8). An intelligibility function was fitted to word-specific data, and the midpoint (Lmid, intelligibility at 50%) of each function was adjusted to equal the mean pre-normalisation midpoint across fragments. In Part II, 30 lists of 20 sentences were generated with relatively homogeneous frequency of matrix word use. The predicted parameters in constant noise (Lmid = 14.0 dB SNR; slope = 13.9%/dB ± 0.0%/dB) are comparable with published equivalents. The babble noise condition was, conversely, less sensitive (Lmid = 14.9 dB SNR; slope = 10.3%/dB ± 0.1%/dB), possibly due to a smaller sample size (n = 8). Overall, this research constituted an important first step in establishing the UCAMST as a reliable measure of speech recognition; follow-on work will validate the normalisation procedure carried out in this project.