Extreme value mixture modelling with medical and industrial applications.
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Extreme value models are typically used to describe the distribution of rare events. Generally, an asymptotically motivated extreme value model is used to approximate the tail of some population distribution. One of the key challenges, with even the simplest application of extreme value models, is to determine the “threshold” above which (if interested in the upper tail), the asymptotically motivated model provides a reliable approximation to the tail of the population distribution. The threshold choice is essentially a balance between the usual bias versus variance tradeoff. Practitioners should choose as high a threshold as possible,such that the asymptotic approximation is reliable, i.e. little bias, but not so high that there is insufficient data to reliably estimate the model parameters, i.e. increasing variance. Traditionally, graphical diagnostics evaluating various properties of the model fit have been used to determine the threshold. Once chosen via these diagnostics, the threshold is treated as a fixed quantity, hence the uncertainty associated with its estimation is not accounted for. A plethora of recent articles have proposed various extreme value mixture models for threshold estimation and quantifying the corresponding uncertainty. Further, the subjectivity of threshold estimation is removed as the mixture models typically treat the threshold as a parameter, so it can be objectively estimated using standard inference tools, avoiding the aforementioned graphical diagnostics. These mixture models are typically easy to automate for application to multiple data sets, or in forecasting situations, for which various adhoc adaptations have had to be made in the past to overcome the threshold estimation problem. The drawback with most of the mixture models currently in the literature is the prior specification of a parametric model for the bulk of the distribution, which can be sensitive to model misspecification. In particular, misspecification of the bulk model’s lower tail behaviour can have a large impact on the bulk fit and therefore on the upper tail fit, which is a serious concern. Non-parametric and semi-parametric alternatives have very recently been proposed, but these tend to suffer from complicated computational aspects in the inference or challenges with interpretation of the final estimated tail behaviour. This thesis focusses on developing a flexible extremal mixture model which splices together the usual extreme value model for the upper tail behavior, with the threshold as a parameter, and the “bulk” of the distribution below the threshold captured by a non-parametric kernel density estimator. This representation avoids the need to specify a-priori a particular parametric model for the bulk distribution, and only really requires the trivial assumption of a smooth density which is realistic in most applications. This model overcomes sensitivity to the specification of the bulk distribution (and in particular its lower tail). Inference for all the parameters, including the threshold and the kernel bandwidth, is carried out in a Bayesian paradigm, potentially allowing sources of expert information to be included, which can help with the inherent sparsity of extremal sample information. A simulation study is used to demonstrate the performance of the proposed mixture model. A known problem with kernel density estimators used in the original extremal mixture model proposed, is that they suffer from edge effects if the (lower) tail does not decay away to zero at the boundary. Various adaptations have been proposed in the nonparametric density estimation literature, which have been used within this thesis to extend the extreme value mixture model to overcome this issue, i.e. producing a boundary corrected kernel density estimator for the bulk distribution component of my extremal mixture model. An alternative approach of replacing both the upper and lower tails by extremal tail models is also shown to resolve the boundary correction issue, and also have the secondary benefits of: • robustness of standard kernel bandwidth estimators against outliers in the tail; • consistent estimator of the bandwidth for heavy tailed populations. This research further extends the novel mixture model to describe non-stationary features. Extension of the other mixture models seen in the literature to model non-stationarity appears rather complex, as they require specification of not only how the usual threshold and point process parameters vary over time or space but also those of the bulk distribution component of the models. The benefit of this particular mixture model is that the nonstationarity in the threshold and point process parameters can be modeled in the usual way(s), with the only other parameter being the kernel bandwidth where it is safe in most applications to assume that it does not vary or will typically vary very slowly. The non-stationary mixture also automatically accounts for the uncertainty associated with estimation of the parameters of the time-varying threshold, which no other non-stationary extremal model in the literature has achieved thus far. Results from simulations and an application using Bayesian inference are given to assess the performance of the model. Further, a goal of this research is to contribute to the refinement of our understanding of “normal ranges” for high frequency physiological measurements from pre-term babies. Clinicians take various physiological measurements from premature babies in neonatal intensive care units (NICUs) for assessing the condition of the neonate. These measurements include oxygen saturation, pulse rates and respiration rates. It is known that there are deficiencies in our knowledge of “normal ranges”, hence refinement of ranges essentially requires reliable estimation of relatively high quantiles (e.g. 95% or 99%). Models proposed within this thesis are applied to pulse rates and/or oxygen saturation levels of neonates in Christchurch Women’s Hospital, New Zealand. A further application of the stationary extremal mixture model is for assessing the risk of certain temperature levels with cores of Magnox nuclear reactors, combining predictions from a detailed statistical model for temperature prediction and extremal modelling of the residuals for assessing the remaining uncertainty.