Genetic models to predict the development of colorectal cancer.

Ainsworth, Rachel

Genetic models to predict the development of colorectal cancer.

Files

Ainsworth, Rachel_final Master's Thesis.pdf (3.56 MB)

Type of content

Theses / Dissertations

UC permalink

https://hdl.handle.net/10092/102763
http://dx.doi.org/10.26021/11897

Thesis discipline

Biological Sciences

Degree name

Master of Science

Publisher

University of Canterbury

Language

English

Date

2021

Authors

Ainsworth, Rachel

Abstract

Background: Survival rates for colorectal cancer are highest when cancer is diagnosed at an early stage but very few cancers are diagnosed before they progress to later stages. A model which could predict who will develop colorectal cancer based on genetic information would allow targeted screening of high-risk individuals. Genome-wide association studies (GWAS) have identified ~100 genetic variants (SNPs) that are individually associated with the development of colorectal cancer, but models built using these SNPs do not identify all high-risk individuals (AUC of 0.629).

Methods: To improve the performance of polygenic risk score models, three methods were tested: first, the use of rare allele principal components; second, the identification of clusters of colorectal cancer patients with the same underlying genetic causes of cancer; third, the incorporation of interactions within gradient based tree models.

Results: Both rare and common allele principal components were found to identify population groups, but this did not improve the performance of models to predict the development of colorectal cancer. Clusters which represented similar underlying genetic causes of colorectal cancer were unable to be identified, although models that predict the location of colorectal cancer performed significantly better than models built with linear discriminant analysis (p-value=0.022). The use of gradient boosted tree models significantly improved the performance of models to predict the development of colorectal cancer, compared with linear models for the same dataset (p−value=0.0258). However, there was only weak evidence of interactions in the gradient boosted tree models. When variables were selected with random forests or gradient boosted trees, some of the SNPs selected had missing genotypes that were highly favourable or unfavourable for colorectal cancer (odds ratios of 0.446 and 1.77).

Conclusion: The performance of models to identify individuals at high-risk for the development of colorectal cancer may be able to be improved through the use of gradient boosted tree models. The treatment of missing genotypes warrants further study due to the strong odds ratios attached to some genotypes that are missing.

Rights

https://canterbury.libguides.com/rights/theses

Collections

Science: Theses and Dissertations

Full item page