Predicting treatment outcomes in dysarthria through speech feature analysis.
Thesis DisciplineSpeech and Language Sciences
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Increasing loudness and reducing speech rate are common behavioural speech modifications used in the treatment of dysarthria. Both strategies provide a promising basis for intervention, but studies have reported considerable inter-participant variability in the intelligibility benefit gained. While neurologic aetiology and the Mayo classification system are generally used to group participants with dysarthria in treatment studies, there is little evidence that this approach provides meaningful insight into which individuals benefit from particular speech modification strategies. This thesis posited that differences in baseline speech symptoms between speakers could underlie significant disparities in treatment outcomes. Hence the overall aim, addressed in the final phase of this thesis, was to identify whether measurements of individuals’ baseline speech could be used to predict their intelligibility gains in response to cues to speak louder and reduce speech rate. To begin, the first two phases of this thesis addressed methodological issues in the assessment of speech features. The purpose of these investigations was to refine the methods of acoustic and perceptual analysis employed in the final phase, and test their application on New Zealand speakers with and without dysarthria. Phase one of this thesis focused on vowel dispersion and speech duration in healthy, older speakers of New Zealand English (NZE), as it was unknown how NZE’s unique dialect might affect commonly used metrics of vowel articulation. A group of 149 NZE speakers aged 65 to 90 years read a standard passage. Two formant measurements, from selected [i:], [ɐ:], and [o:] vowels, were used to calculate two measurements of Vowel Space Area (VSA) for each speaker. Average vowel duration was calculated from segments across the passage. Results demonstrated that measures of VSA, adapted for NZE, produced a similar range of values to those reported in previous studies of speakers from the United States. In addition, a statistically significant relationship existed between speakers’ average vowel durations and VSA measurements indicating that, on average, speakers with slower speech rates produced more acoustically distinct speech segments. As expected, increases in average vowel duration were found with advancing age. However, speakers’ formant values remained unchanged. The second phase explored the ability of different acoustic and perceptual measures to index speakers’ baseline (i.e., habitual) dysarthria severity. Sixty-one speakers (17 healthy individuals and 44 speakers with dysarthria) read a standard passage. To obtain acoustic data, different formant extraction points and frequency measures were trialled. VSA and an adapted Formant Centralization Ratio (FCR) were calculated using first and second formants of the speakers’ [ɐː], [iː] and [oː] vowels. Twenty-eight listeners completed separate ratings of intelligibility and speech precision. Perceptually, listener ratings of speech precision provided the best index of acoustic change. Acoustically, the combined use of an articulatory-based formant extraction point, Bark frequency units, and the FCR was most effective in explaining perceptual ratings. Based on this investigation, perceptual ratings of speech precision and acoustic measurements of FCR (derived from a flexible extraction point and calculated in Bark) were selected for use in phase three, to model speakers’ responses to treatment cues. Phase three addressed the central aim of the thesis by exploring whether targeted acoustic and perceptual features of participants’ habitual speech could predict their degree of intelligibility improvement in response to cues to speak louder and reduce speech rate. Fifty speakers of NZE participated (aged between 43 and 89 years), 43 of whom were diagnosed with dysarthria. The remaining speakers acted as healthy controls. All participants read the Grandfather Passage in habitual, loud and slow speaking modes. The study was conducted in two parts. Firstly, a perceptual experiment was completed, where 18 listeners rated the intelligibility of speakers’ habitual, loud and slow speech recordings. Secondly, an acoustic analysis was completed to measure a range of baseline speech features. This speech analysis employed measurements from the phase two investigation, in conjunction with acoustic measures of articulatory rate, vowel harmonics, cepstral peak prominences, and variability in speakers’ vowel durations, pitch and speech intensity. Statistical analyses revealed that intelligibility gains in the loud condition were best predicted by speakers’ baseline articulatory rate and listener ratings of baseline speech precision. Intelligibility gains in the slow condition were best modelled by ratings of speech precision and variations in speakers’ vowel durations. Overall, these models were able to account for approximately 1/3 of the variance in speakers’ intelligibility gains. These findings were promising, but the time required to analyse the acoustic data limited the clinical applicability of the models. A follow up study investigated whether time-efficient, automated measurements of the baseline speech signal could similarly account for differences in speakers’ responses to treatment cues. The performance of features derived from Mel-frequency cepstral coefficients, long term average spectra and envelope modulation spectra was compared against the targeted measurements extracted in the previous study. Cross-validation techniques were used to determine how well models of intelligibility gain could perform on speakers they had not been trained on. When the optimal number of speech features were included in a forward regression model, 17% of the variance in speakers’ responses to cues to speak slower could be accounted for by the targeted speech features. The automated measurements accounted for around 10%. In contrast, in the loud condition, both feature sets exhibited a stronger performance. The automated features were able to account for up to 25% of the variance in speakers’ intelligibility gains, while the targeted measures accounted for 19%. Thus this final investigation offered evidence that automated feature sets, which are time efficient and require no subjective judgments of researchers, could be used diagnostically to guide treatment decisions. Overall, this thesis demonstrated that certain features of speakers’ baseline speech could account for significant variation in their intelligibility gains. The ability to classify speakers likely to achieve positive treatment outcomes based on their presenting speech features has the potential to facilitate clinical decision making within an evidence-based framework and, ultimately, promote stronger group treatment outcomes. Future studies that utilize larger participant groups and a wider range of treatment strategies are needed to develop more personalized and targeted approaches to speech therapy for people with dysarthria.