Analysis of speech and other sounds
Thesis DisciplineElectrical Engineering
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
This thesis comprises a study of various types of signal processing techniques, applied to the tasks of extracting information from speech, cough, and dolphin sounds. Established approaches to analysing speech sounds for the purposes of low data rate speech encoding, and more generally to determine the characteristics of the speech signal, are reviewed. Two new speech processing techniques, shift-and-add and CLEAN (which have previously been applied in the field of astronomical image processing), are developed and described in detail. Shift-and-add is shown to produce a representation of the long-term "average" characteristics of the speech signal. Under certain simplifying assumptions, this can be equated to the average glottal excitation. The iterative deconvolution technique called CLEAN is employed to deconvolve the shift-and-add signal from the speech signal. Because the resulting "CLEAN" signal has relatively few non-zero samples, it can be directly encoded at a low data rate. The performance of a low data rate speech encoding scheme that takes advantage of this attribute of CLEAN is examined in detail. Comparison with the multi-pulse LP C approach to speech coding shows that the new method provides similar levels of performance at medium data rates of about 16kbit/s. The changes that occur in the character of a person's cough sounds when that person is afflicted with asthma are outlined. The development and implementation of a micro-computer-based cough sound analysis system, designed to facilitate the ongoing study of these sounds, is described. The system performs spectrographic analysis on the cough sounds. A graphical user interface allows the sound waveforms and spectra to be displayed and examined in detail. Preliminary results are presented, which indicate that the spectral content of cough sounds are changed by asthma. An automated digital approach to studying the characteristics of Hector's dolphin vocalisations is described. This scheme characterises the sounds by extracting descriptive parameters from their time and frequency domain envelopes. The set of parameters so obtained from a sample of click sequences collected from free-ranging dolphins is analysed by principal component analysis. Results are presented which indicate that Hector's dolphins produce only a small number of different vocal sounds. In addition to the statistical analysis, several of the clicks, which are assumed to be used for echo-location, are analysed in terms of their range-velocity ambiguity functions. The results suggest that Hector's dolphins can distinguish targets separated in range by about 2cm, but are unable to separate targets that differ only in their velocity.