Genetic models to predict the development of colorectal cancer.

dc.contributor.authorAinsworth, Rachel
dc.date.accessioned2021-10-21T22:53:38Z
dc.date.available2021-10-21T22:53:38Z
dc.date.issued2021en
dc.description.abstractBackground: Survival rates for colorectal cancer are highest when cancer is diagnosed at an early stage but very few cancers are diagnosed before they progress to later stages. A model which could predict who will develop colorectal cancer based on genetic information would allow targeted screening of high-risk individuals. Genome-wide association studies (GWAS) have identified ~100 genetic variants (SNPs) that are individually associated with the development of colorectal cancer, but models built using these SNPs do not identify all high-risk individuals (AUC of 0.629). Methods: To improve the performance of polygenic risk score models, three methods were tested: first, the use of rare allele principal components; second, the identification of clusters of colorectal cancer patients with the same underlying genetic causes of cancer; third, the incorporation of interactions within gradient based tree models. Results: Both rare and common allele principal components were found to identify population groups, but this did not improve the performance of models to predict the development of colorectal cancer. Clusters which represented similar underlying genetic causes of colorectal cancer were unable to be identified, although models that predict the location of colorectal cancer performed significantly better than models built with linear discriminant analysis (p-value=0.022). The use of gradient boosted tree models significantly improved the performance of models to predict the development of colorectal cancer, compared with linear models for the same dataset (p−value=0.0258). However, there was only weak evidence of interactions in the gradient boosted tree models. When variables were selected with random forests or gradient boosted trees, some of the SNPs selected had missing genotypes that were highly favourable or unfavourable for colorectal cancer (odds ratios of 0.446 and 1.77). Conclusion: The performance of models to identify individuals at high-risk for the development of colorectal cancer may be able to be improved through the use of gradient boosted tree models. The treatment of missing genotypes warrants further study due to the strong odds ratios attached to some genotypes that are missing.en
dc.identifier.urihttps://hdl.handle.net/10092/102763
dc.identifier.urihttp://dx.doi.org/10.26021/11897
dc.languageEnglish
dc.language.isoen
dc.publisherUniversity of Canterburyen
dc.rightsAll Rights Reserveden
dc.rights.urihttps://canterbury.libguides.com/rights/thesesen
dc.titleGenetic models to predict the development of colorectal cancer.en
dc.typeTheses / Dissertationsen
thesis.degree.disciplineBiological Sciencesen
thesis.degree.grantorUniversity of Canterburyen
thesis.degree.levelMastersen
thesis.degree.nameMaster of Scienceen
uc.bibnumber3103197
uc.collegeFaculty of Scienceen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ainsworth, Rachel_final Master's Thesis.pdf
Size:
3.56 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: