Evaluating ensemble classifiers for spam filtering

Carpinter, James M

Evaluating ensemble classifiers for spam filtering

Files

hons_0504.pdf (470.88 KB)

Type of content

Theses / Dissertations

UC permalink

http://hdl.handle.net/10092/14822
http://dx.doi.org/10.26021/1336

Degree name

Other

Publisher

University of Canterbury

Language

English

Date

2005

Authors

Carpinter, James M

Abstract

In this study, the ensemble classifier presented by Caruana, Niculescu-Mizil, Crew & Ksikes (2004) is investigated. Their ensemble approach generates thousands of models using a variety of machine learning algorithms and uses a forward stepwise selection to build robust ensembles that can be optimised to an arbitrary metric. On average, the resulting ensemble out-performs the best individual machine learning models. The classifier is implemented in the WEKA machine learning environment, which allows the results presented by the original paper to be validated and the classifier to be extended to multi-class problem domains. The behaviour of different ensemble building strategies is also investigated. The classifier is then applied to the spam filtering domain, where it is tested on three different corpora in an attempt to provide a realistic evaluation of the system. It records similar performance levels to that seen in other problem domains and out-performs individual models and the naive Bayesian filtering technique regularly used by commercial spam filtering solutions. Caruana et al.’s (2004) classifier will typically outperform the best known models in a variety of problems.

Rights

https://canterbury.libguides.com/rights/theses

Collections

Engineering: Theses and Dissertations

Full item page