Improving the reliability of citizen science data (2021)
Type of ContentTheses / Dissertations
Degree NameDoctor of Philosophy
PublisherUniversity of Canterbury
Citizen science is a growing movement that enables volunteers to help scientists collect and analyse information. Citizen science can solve problems that may, through ordinary methods, be intractable. In the context of ecology this is extremely important, as global biodiversity is in sharp decline, and there are currently not enough resources for traditional ecological monitoring to meet the current monitoring demands. However, there are some factors that make citizen science problematic, and raise concerns about reliability. Individual citizen scientists may vary in ability, there is the opportunity for persistent bias in the data, and the level of participant guidance given varies widely between citizen science projects.
In this thesis we use two contrasting mathematical methodologies: A Bayesian approach, and stochastic processes (particularly, random walk theory), to quantify and improve the reliability of citizen science data. We apply our methods to iNaturalist NZ, a citizen science project that provides participants with an online community to share and classify various observations of biota when and how they choose.
Our Bayesian approach combines individual citizen scientists’ classifications for an image into a likely final classification. This approach improves on the common, but simple, majority vote method by estimating and utilising a measure of each participant’s ability to correctly classify an image. This approach optimises the citizen scientists’ classification efforts while also ensuring a desired level of final classification certainty.
Our stochastic process approach models the stochastic nature of citizen scientists and their observations. We use random walk theory to model a citizen scientist en- countering and sharing observations of a given species and scale this up to simulate multiple years of iNaturalist NZ observation data. We use the simulated data to test the ability to specify a statistical model that differentiates between temporal changes in the number of species observations due to variation in observer behaviour, versus ecological changes in the species abundance. Without sufficient metadata about observer behaviour it is difficult to specify an appropriate statistical model. Observer metadata may be explicitly collected, or information could be inferred from the observation data. For example, we show that it is well supported by the iNaturalist NZ data that the probability of an observer sharing an observation decays as they share more observations during a walk.
The methods developed in this thesis are applicable to a wide range of citizen science projects. Our methods are able to take large, possibly unreliable datasets, and both quantify the reliability and improve the usability of the data.
RightsAll Right Reserved
Showing items related by title, author, creator and subject.
Modelling Information Quality and Source Reliability to Improve the Trust of Volunteered Geographic Information Goodhue, P; Delikostidis, I (2017)Crowdsourcing and volunteered geographic information (VGI) can improve the way we collect information about the world we live in, but they are not without limitations. Traditionally sourced geographic information (GI) ...
Chen K; Moravec JÍC; Welch D; Drummond AJ; Gavryushkin, Alex (Oxford University Press (OUP), 2022)Single-cell sequencing provides a new way to explore the evolutionary history of cells. Compared to traditional bulk sequencing, where a population of heterogeneous cells is pooled to form a single observation, single-cell ...
Hoffmann, S.; Hothorn, L.A.; Edler, L.; Kleensang, A.; Suzuki, M.; Phrakonkham, P.; Gerhard, D. (University of Canterbury. Mathematics and Statistics, 2012)Validation activities of the BALB/c 3T3 cell transformation assay (CTA) – a test method used for the assessment of the carcinogenic potential of compounds – have revealed the need for statistical analysis tailored to ...