Evaluating ensemble classifiers for spam filtering

dc.contributor.authorCarpinter, James M
dc.date.accessioned2017-12-05T02:59:35Z
dc.date.available2017-12-05T02:59:35Z
dc.date.issued2005en
dc.description.abstractIn this study, the ensemble classifier presented by Caruana, Niculescu-Mizil, Crew & Ksikes (2004) is investigated. Their ensemble approach generates thousands of models using a variety of machine learning algorithms and uses a forward stepwise selection to build robust ensembles that can be optimised to an arbitrary metric. On average, the resulting ensemble out-performs the best individual machine learning models. The classifier is implemented in the WEKA machine learning environment, which allows the results presented by the original paper to be validated and the classifier to be extended to multi-class problem domains. The behaviour of different ensemble building strategies is also investigated. The classifier is then applied to the spam filtering domain, where it is tested on three different corpora in an attempt to provide a realistic evaluation of the system. It records similar performance levels to that seen in other problem domains and out-performs individual models and the naive Bayesian filtering technique regularly used by commercial spam filtering solutions. Caruana et al.’s (2004) classifier will typically outperform the best known models in a variety of problems.en
dc.identifier.urihttp://hdl.handle.net/10092/14822
dc.identifier.urihttp://dx.doi.org/10.26021/1336
dc.languageEnglish
dc.language.isoen
dc.publisherUniversity of Canterburyen
dc.rightsAll Right Reserveden
dc.rights.urihttps://canterbury.libguides.com/rights/thesesen
dc.titleEvaluating ensemble classifiers for spam filteringen
dc.typeTheses / Dissertationsen
thesis.degree.grantorUniversity of Canterburyen
thesis.degree.levelDoctoralen
thesis.degree.nameOtheren
uc.collegeFaculty of Engineeringen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hons_0504.pdf
Size:
470.88 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: