Computer-Enhanced Knowledge Discovery in Environmental Science
Thesis DisciplineEnvironmental Sciences
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Encouraging the use of computer algorithms by developing new algorithms and introducing uncommonly known algorithms for use on environmental science problems is a significant contribution, as it provides knowledge discovery tools to extract new aspects of results and draw new insights, additional to those from general statistical methods. Conducting analysis with appropriately chosen methods, in terms of quality of performance and results, computation time, flexibility and applicability to data of various natures, will help decision making in the policy development and management process for environmental studies. This thesis has three fundamental aims and motivations. Firstly, to develop a flexibly applicable attribute selection method, Tree Node Selection (TNS), and a decision tree assessment tool, Tree Node Selection for assessing decision tree structure (TNS-A), both of which use decision trees pre-generated by the widely used C4.5 decision tree algorithm as their information source, to identify important attributes from data. TNS helps the cost effective and efficient data collection and policy making process by selecting fewer, but important, attributes, and TNS-A provides a tool to assess the decision tree structure to extract information on the relationship of attributes and decisions. Secondly, to introduce the use of new, theoretical or unknown computer algorithms, such as the K-Maximum Subarray Algorithm (K-MSA) and Ant-Miner, by adjusting and maximizing their applicability and practicality to assess environmental science problems to bring new insights. Additionally, the unique advanced statistical and mathematical method, Singular Spectrum Analysis (SSA), is demonstrated as a data pre-processing method to help improve C4.5 results on noisy measurements. Thirdly, to promote, encourage and motivate environmental scientists to use ideas and methods developed in this thesis. The methods were tested with benchmark data and various real environmental science problems: sea container contamination, the Weed Risk Assessment model and weed spatial analysis for New Zealand Biosecurity, air pollution, climate and health, and defoliation imagery. The outcome of this thesis will be to introduce the concept and technique of data mining, a process of knowledge discovery from databases, to environmental science researchers in New Zealand and overseas by collaborating on future research to achieve, together with future policy and management, to maintain and sustain a healthy environment to live in.