A Study of Features and Processes Towards Real-time Speech Word Recognition

Clark, Tracy M.

A Study of Features and Processes Towards Real-time Speech Word Recognition

dc.contributor.author	Clark, Tracy M.	en
dc.date.accessioned	2013-04-12T03:36:44Z
dc.date.available	2013-04-12T03:36:44Z
dc.date.issued	1993	en
dc.description.abstract	Word recognition techniques are reviewed. An exhaustive comparative study of many of the factors that affect recognition accuracy is presented. Experiments centred on four major areas of word recognition are described: pre-processing techniques, recognition features, recognition algorithms and distance measures. Recognition accuracy, in the context of each of these four areas, is investigated using the digit vocabulary spoken by 10 New Zealand (6 male and 4 female) and 38 American (20 male and 18 female) speakers. Pre-processing techniques examined are the type of window, the length of the data name, data frame overlap, and pre-emphasis. Acoustic features tested include temporal features such as energy and zero-crossing rate, as well as frequency based acoustic representations such as linear prediction coefficients, cepstral coefficients, dynamic (transitional) cepstral coefficients, and perceptual linear prediction coefficients. Three types of distance measures are also reported on the Euclidean, the weighted Euclidean, and the projection. Two methods of training, random template selection and clustering, are investigated. Accuracy improvement by combining different features is also examined. Implementation of a real-time word recognition system designed on the basis of the comparative study and experiments, is described. The system is based on a TMS320C30 and takes around 0.03 seconds per recognition. The real-time system achieves speaker-dependent accuracies greater than 95% and speaker-independent accuracies greater than 70% for the digit vocabulary. An examination is also made of two methods of continuous recognition using sub-word representations. Both these methods take advantage of isolated word recognition techniques such as dynamic programming. A segmentation method and anon-segmentation method were investigated. Accuracy of the segmentation recognition method is found to depend linearly on the accuracy of the segmenter. With a segmentation error of 22%, an average recognition accuracy of 90.7% was obtained for 10 vowels and 2 consonants. For the non-segmentation recognition method, an average accuracy of 75% was obtained. Although the segmentation method produced higher accuracies than the non-segmentation method, it is argued that the removal of the segmentation is an advantage that greatly simplifies the recognition strategy.	en
dc.identifier.uri	http://hdl.handle.net/10092/7561
dc.identifier.uri	http://dx.doi.org/10.26021/3489
dc.language.iso	en
dc.publisher	University of Canterbury. Electrical and Electronic Engineering	en
dc.relation.isreferencedby	NZCU	en
dc.rights	Copyright Tracy M. Clark	en
dc.rights.uri	https://canterbury.libguides.com/rights/theses	en
dc.title	A Study of Features and Processes Towards Real-time Speech Word Recognition	en
dc.type	Theses / Dissertations
thesis.degree.discipline	Electrical Engineering
thesis.degree.grantor	University of Canterbury	en
thesis.degree.level	Doctoral	en
thesis.degree.name	Doctor of Philosophy	en
uc.bibnumber	433343
uc.college	Faculty of Engineering	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: clark_thesis.pdf
Size:: 13.93 MB
Format:: Adobe Portable Document Format

Download

Collections

Engineering: Theses and Dissertations