The recognition of New Zealand English closing diphthongs using time-delay neural networks (1995)
Type of ContentTheses / Dissertations
Thesis DisciplineElectrical Engineering
Degree NameDoctor of Philosophy
PublisherUniversity of Canterbury. Electrical and Electronic Engineering
AuthorsKirkland, John Robertshow all
As a step towards the development of a modular time-delay neural network (TDNN) for recognizing phonemes realized with a New Zealand English accent, this thesis focuses on the development of an expert module for closing diphthong recognition. The performances of traditional and squad-based expert modules are compared speaker-dependently for two New Zealand English speakers (one male and one female). Examples of each kind of expert module are formed from one of three types of TDNN, referred to as basic-token TDNN, extended-token TDNN and sequence-token TDNN. Of the traditional expert modules tested, those comprising extended-token TDNNs are found to afford the best performance compromises and are, therefore, preferable if a traditional expert module is to be used. Comparing the traditional and squad-based expert modules tested, the latter afford significantly better recognition and/or false-positive error performances than the former, irrespective of the type of TDNN used. Consequently, it is concluded that squad-based expert modules are preferable to their traditional counterparts for closing diphthong recognition. Of the squad-based expert modules tested, those comprising sequence-token TDNNs are found to afford consistently better false-positive error performances than those comprising basic- or extended-token TDNNs, while similar recognition performances are afforded by all. Consequently, squad-based expert modules comprising sequence-token TDNNs are recommended as the preferred method of recognizing closing diphthongs realized with a New Zealand accent. This thesis also presents results demonstrating that squad-based expert modules comprising sequence-token TDNN s may be trained to accommodate multiple speakers and in a manner capable of handling both uncorrupted and highly corrupted speech utterances.