Evaluating ensemble classifiers for spam filtering

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Degree name
Other
Publisher
University of Canterbury
Journal Title
Journal ISSN
Volume Title
Language
English
Date
2005
Authors
Carpinter, James M
Abstract

In this study, the ensemble classifier presented by Caruana, Niculescu-Mizil, Crew & Ksikes (2004) is investigated. Their ensemble approach generates thousands of models using a variety of machine learning algorithms and uses a forward stepwise selection to build robust ensembles that can be optimised to an arbitrary metric. On average, the resulting ensemble out-performs the best individual machine learning models. The classifier is implemented in the WEKA machine learning environment, which allows the results presented by the original paper to be validated and the classifier to be extended to multi-class problem domains. The behaviour of different ensemble building strategies is also investigated. The classifier is then applied to the spam filtering domain, where it is tested on three different corpora in an attempt to provide a realistic evaluation of the system. It records similar performance levels to that seen in other problem domains and out-performs individual models and the naive Bayesian filtering technique regularly used by commercial spam filtering solutions. Caruana et al.’s (2004) classifier will typically outperform the best known models in a variety of problems.

Description
Citation
Keywords
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Rights
All Right Reserved