Multimodal Speech-Gesture Interaction with 3D Objects in Augmented Reality Environments

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Computer Science
Degree name
Doctor of Philosophy
University of Canterbury. Department of Computer Science and Software Engineering
Journal Title
Journal ISSN
Volume Title
Lee, Minkyung

Augmented Reality (AR) has the possibility of interacting with virtual objects and real objects at the same time since it combines the real world with computer-generated contents seamlessly. However, most AR interface research uses general Virtual Reality (VR) interaction techniques without modification. In this research we develop a multimodal interface (MMI) for AR with speech and 3D hand gesture input. We develop a multimodal signal fusion architecture based on the user behaviour while interacting with the MMI that provides more effective and natural multimodal signal fusion. Speech and 3D vision-based free hand gestures are used as multimodal input channels. There were two user observations (1) a Wizard of Oz study and (2)Gesture modelling. With the Wizard of Oz study, we observed user behaviours of interaction with our MMI. Gesture modelling was undertaken to explore whether different types of gestures can be described by pattern curves. Based on the experimental observations, we designed our own multimodal fusion architecture and developed an MMI. User evaluations have been conducted to evaluate the usability of our MMI. As a result, we found that MMI is more efficient and users are more satisfied with it when compared to the unimodal interfaces. We also describe design guidelines which were derived from our findings through the user studies.

augmented reality, multimodal interface, natural hand gesture, gesture-speech input, multimodal fusion
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Copyright Min Kyung Lee