Predicting decayed, missing or filled teeth in young children : a comparative use of conventional statistical methods and machine learning.

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Statistics
Degree name
Master of Science
Publisher
University of Canterbury
Journal Title
Journal ISSN
Volume Title
Language
English
Date
2019
Authors
Sonal, Sarah A.
Lee, Martin
Brown Jennifer A.
Schluter, Philip J.
Abstract

Background: Early childhood caries is a preventable chronic disease with a strong socio-economic gradient. The overall arching goal of this research is to establish if routinely collected data can be used to predict dental disease. The primary aim of this study was to compare a conventional statistical technique with a supervised machine learning technique, to establish the most appropriate method for answering this research goal.

Methods: This study utilised routinely collected dental records, hospital admissions and New Zealand Index of Multiple Deprivation data from 21,236 children aged 5-years, in the Canterbury region. Selection was limited to children who turned five years old between 2014 and 2017. The data were split into 3 datasets, a training dataset to build models to predict a count of decayed, missing or filled teeth, a tuning dataset to tune the best of these models, and a testing dataset to compare the models on their predictive abilities. Models were compared on goodness-of-fit, root mean square error (RMSE) and sensitivity and specificity.

Results: The zero-inflated negative binomial and the random forest models performed better at fitting and predicting than the other methods considered. The random forest model performed better at prediction with a RMSE of 2.678 compared to the zero-inflated negative binomial RMSE of 2.727. The sensitivity for the random forest model was 0.203 which was higher than the zero-inflated negative binomial sensitivity of 0.071. Specificity was 0.926 for the random forest model and 0.972 for the zero-inflated negative binomial model. The model building, tuning and testing process for the random forest model was more computationally efficient than for the zero-inflated negative binomial model.

Conclusion: Machine learning, specifically random forests, are a faster approach to modelling routinely collected dental data, with greater precision and accuracy to fit the data and predict dental disease.

Description
Citation
Keywords
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Rights
All Right Reserved