Improving the reliability of citizen science data

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Degree name
Doctor of Philosophy
University of Canterbury
Journal Title
Journal ISSN
Volume Title
Mugford, Julie

Citizen science is a growing movement that enables volunteers to help scientists collect and analyse information. Citizen science can solve problems that may, through ordinary methods, be intractable. In the context of ecology this is extremely important, as global biodiversity is in sharp decline, and there are currently not enough resources for traditional ecological monitoring to meet the current monitoring demands. However, there are some factors that make citizen science problematic, and raise concerns about reliability. Individual citizen scientists may vary in ability, there is the opportunity for persistent bias in the data, and the level of participant guidance given varies widely between citizen science projects.

In this thesis we use two contrasting mathematical methodologies: A Bayesian approach, and stochastic processes (particularly, random walk theory), to quantify and improve the reliability of citizen science data. We apply our methods to iNaturalist NZ, a citizen science project that provides participants with an online community to share and classify various observations of biota when and how they choose.

Our Bayesian approach combines individual citizen scientists’ classifications for an image into a likely final classification. This approach improves on the common, but simple, majority vote method by estimating and utilising a measure of each participant’s ability to correctly classify an image. This approach optimises the citizen scientists’ classification efforts while also ensuring a desired level of final classification certainty.

Our stochastic process approach models the stochastic nature of citizen scientists and their observations. We use random walk theory to model a citizen scientist en- countering and sharing observations of a given species and scale this up to simulate multiple years of iNaturalist NZ observation data. We use the simulated data to test the ability to specify a statistical model that differentiates between temporal changes in the number of species observations due to variation in observer behaviour, versus ecological changes in the species abundance. Without sufficient metadata about observer behaviour it is difficult to specify an appropriate statistical model. Observer metadata may be explicitly collected, or information could be inferred from the observation data. For example, we show that it is well supported by the iNaturalist NZ data that the probability of an observer sharing an observation decays as they share more observations during a walk.

The methods developed in this thesis are applicable to a wide range of citizen science projects. Our methods are able to take large, possibly unreliable datasets, and both quantify the reliability and improve the usability of the data.

Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
All Right Reserved