The statistics of topic modelling.

Type of content
Theses / Dissertations
Publisher's DOI/URI
Thesis discipline
Statistics
Degree name
Master of Science
Publisher
University of Canterbury. Mathematics and Statistics
Journal Title
Journal ISSN
Volume Title
Language
Date
2015
Authors
Abey, Rebecca
Abstract

This research project aims to provide a clear and concise guide to latent dirichlet allocation which is a form of topic modelling. The aim is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling in their work. In order to achieve this, the thesis provides a step-by-step explanation of how topic modelling works. A range of tools that can be used to perform a topic model analysis are also described. The first chapter gives an explanation of how topic modelling, and (more specifically), latent dirichlet allocation works; it offers a very basic explanation and then provides an easy to follow mathematical explanation. The second chapter explains how to perform a topic model analysis; this is done through an explanation of each step used to run a topic model analysis, starting from the type of dataset through to the software packages available to use. The third section provides an example topic model analysis, based on the Philpapers dataset. The final section provides a discussion on the highlights of each chapter and areas for further research.

Description
Citation
Keywords
statistics, topic modelling, text analysis
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Rights
Copyright Rebecca Abey