Considering unseen arrivals in predictions of establishment risk based on border biosecurity interceptions

. Assessing species establishment risk is an important task used for informing biosecurity activities aimed at preventing biological invasions. Propagule pressure is a major contributor to the probability of invading species establishment; however, direct assessment of numbers of individuals arriving is virtually never possible. Inspections conducted at borders by biosecurity officials record counts of species (or higher-level taxa) intercepted during inspec- tions, which can be used as proxies for arrival rates. Such data may therefore be useful for predicting species establishments, though some species may establish despite never being intercepted. We present a stochastic process-based model of the arrival – interception – establishment process to predict species establishment risk from interception count data. The model can be used to estimate the probability of establishment, both for species that were intercepted and species that had no interceptions during a given observation period. We fit the stochastic model to data on two insect families, Cerambycidae and Aphididae, that were intercepted and/ or established in the United States or New Zealand. We also explore the effects of variation in model parameters and the inclusion of an Allee effect in the establishment probability. Although interception data sets contain much noise due to variation in inspection policy, inter- ception effort and among-species differences in detectability, our study shows that it is possible to use such data for predicting establishments and distinguishing differences in establishment risk profile between taxonomic groups. Our model provides a method for predicting the number of species that have breached border biosecurity, including both species detected during inspections but also “ unseen arrivals ” that have never been intercepted, but have not yet established a viable population. These estimates could inform prioritization of different taxonomic groups, pathways or identification effort in biosecurity programs.


INTRODUCTION
Many nonnative species cause significant detrimental economic and ecological impacts (Simberloff et al. 2013). Given current levels of global trade and travel, it is unrealistic to prevent all invasions, but biosecurity measures can reduce rates at which species arrive and establish (Magarey et al. 2009, Leung et al. 2014. Risk assessments are an important component of such programs and are carried out to prioritize which species, pathways, or aspects of the invasion process to target (Hayes 2003, Andersen et al. 2004, Evans 2010. Invasive species risk assessment can also facilitate early detection and eradication by guiding surveillance programs to target exotic species that have high probabilities of establishment. Thus, enhanced tools are needed to predict the relative establishment risk among large groups of potentially damaging species, such as insects. Invasive species (or group of species) risk assessments can be complex, taking into account different stages of invasions (e.g., arrival, establishment, and spread) and often rely on elicitation of expert knowledge (reviewed in Leung et al. 2012). At the level of entire groups of species, expert elicitation may be of limited value, particularly if the biology is diverse within a group of species, or unknown. Alternatively, quantitative models enable a consistent and repeatable framework that can be applied over many species, particularly when assumptions and uncertainty in the assessment are explicitly acknowledged. Predictors of establishment risk include propagule pressure (Brockerhoff et al. 2014), species traits (Fournier et al. 2019), climate and niche matching (Phillips et al. 2018), previous establishment of related species (Seebens et al. 2018), association with trade volumes (Tingley et al. 2018), and the co-occurrence of species in other regions (Worner et al. 2013). Here, we focus on propagule pressure, for example, the rate of arrival events at a country's border. Propagule pressure has been shown to be a strong predictor of invasion success (Lockwood et al. 2005, Simberloff 2009) but is difficult to directly quantify except for intentional introductions, for which records of introduction effort often exist.
Species interceptions by biosecurity officials during inspections at borders (e.g., ports) can be considered a sample of arrival events. Inspection of imports at ports of entry is a critical component of biosecurity programs. Historically, the value of inspection has in part been attributed to the interception of individuals before they enter a new environment and establish. However, in practice, the proportion of individuals that are intercepted at ports is typically very low, and thus, direct beneficial effects may be minimal (Ministry of Agriculture and Forestry 2003, Work et al. 2005, Whyte 2006, Liebhold et al. 2012). There are several purposes for border inspections: inspections (1) provide information about risks associated with individual pests or groups of pests that informs other biosecurity actions, (2) provide information about risks associated with specific commodities and this also informs biosecurity activities, (3) monitor the effectiveness of phytosanitary treatments, (4) incentivize exporters to reduce invasion risk in exports, and (5) directly identify infested shipments so that they can be excluded (Epanchin-Niell 2017). Here, we focus on the benefit that inspection data provide in documenting the presence of species in pathways and, potentially, prediction of future establishments (Brockerhoff et al. 2014).
Individuals from a range of species arrive in a country at different rates. Once arrived, a small proportion are intercepted at the border and hence eliminated. Most individuals that penetrate the border die without establishing a population, while a small remainder survive and establish self-sustaining populations. Some species will establish without ever being intercepted as interceptions represent only a small sample of arriving individuals, or they are intercepted but not identified and consequently not recorded with their actual identity. For example, larvae of many insect species are not readily identifiable using morphological characteristics. A realistic model would predict a non-zero (negligible to low) risk of establishment for non-intercepted species, with the expected value dependent on the taxonomic group. To assess the relative risk of different species establishing in a region, previous models have been fitted to limited groups of species to minimize the effect of variability among higher-level taxa on model uncertainty (e.g., Brockerhoff et al. 2006, Brockerhoff et al. 2014, Phillips et al. 2018. This approach assumes that the species in these groups have only small variation in probabilities of interception and in probabilities of establishment for a non-intercepted arrival event relative to the variation in arrival rates. An arrival event in interception data may represent one or more individuals, since the exact number is typically not recorded. In reality, numerous factors affect probability of interception and probability of establishment per arrival (Duncan et al. 2014, Saccaggi et al. 2016. The probability of interception can vary temporally due to changes in inspection effort relative to the volume of trade and passenger traffic and the level of phytosanitary measures on imports. See Saccaggi et al. (2016) for a review of border biosecurity systems, including how policy and operational constraints can affect how interception data are collected. Probability of interception can also vary among species due to factors such as arrival pathway, and biological characteristics that influence detection or identification rates. The probability of establishment from an individual arrival event will vary among species due to factors such as climate and niche suitability, reproductive strategies, behavioral traits, and Allee effects (Leung et al. 2012).
Several attempts to predict arthropod establishments based on interceptions have been hampered by low or variable interception probabilities, as well as by variation in per arrival establishment probability among species. For example, the majority of unintentionally introduced insect species in Austria, Switzerland, the Czech Republic, and Australia were never intercepted prior to their known establishment Auger-Rozenberg 2006, Caley et al. 2015). The European and Mediterranean Plant Protection Organization (EPPO) has implemented a targeted recording approach that focuses on a predefined list of species. Such an approach limits the reporting of interceptions prior to establishment, thus biasing the estimate of propagule pressure and hence the predictive ability of any interception-based model Auger-Rozenberg 2006, Eschen et al. 2015). Similarly, Caley et al. (2015) observed a poor association between interception and establishment across the most common insect orders in Australia, although those species with a higher interception rate were more likely to establish. Caley et al. (2015) attributed low taxonomic resolution of identified species as a contributing factor to low interception probabilities. In addition, only establishments occurring over the same 20-yr period as the interceptions were included in the analysis, whereas establishments are typically detected after a lag period of several years to several decades. In contrast, some studies on particular insect groups (Coleoptera and Formicidae) have shown significant positive relationships between interceptions and establishments (Brockerhoff et al., 2006, Haack, 2006, Bertelsmeier et al., 2018. This may be partly due to most of these studies combining interception and establishment data across long time periods and in some cases from multiple countries. This increases the interception probability overall, as well as the likelihood that a species has established somewhere, which potentially averages out the biases of individual country data sets. Accounting for sources of variability in interception and establishment probabilities can improve model fit. For example, Bacon et al. (2014) found a positive relationship between interceptions and establishment rates when they incorporated additional climate matching, host availability and trade volume data, contrary to other studies that did not account for niche or climate suitability Auger-Rozenberg 2006, Eschen et al. 2015).
Several models have been developed that use propagule pressure, in some form, to predict establishment risk and may account for some of the variation in interception probability or per arrival event establishment probability. In the simplest case, interception probability is assumed to be one (i.e., all arrivals are recorded), and per arrival establishment probability is assumed to be constant (Leung et al. 2004). More complex models have been suggested to account for variation in per arrival establishment probability due to environmental heterogeneity, demographic stochasticity and Allee effects (summarized in Duncan et al. 2014). Brockerhoff et al. (2014 used a SIMEX-based method to account for interception measurement error and included a model term to account for an Allee effect. In addition, they included the effect of "rare" non-intercepted species with establishment probabilities based on trends in the intercepted species to estimate the number of unseen species. In this paper, we introduce a stochastic model to predict establishment risk for an individual species within a taxonomic group conditional on its interception frequency. Our model explicitly includes terms for both interception and establishment probability, which allows exploration of both these sources of variability. As with previous models, this stochastic model can be applied to a group of exotic species arriving in one location, or a single species arriving in multiple locations. Initially, all sources of variability among species are ignored apart from their rates of arrival at the border, which are unobserved variables in the model. The rationale behind this simplification is that insect species belonging to the same family tend to share many life-history traits that cause them to be associated with a common pathway and to have similar tendencies to successfully establish. We subsequently explore the effects of incorporating variation in species' interception and per arrival establishment probabilities and an Allee effect. We apply the model to data for two families of insects (Cerambycidae and Aphididae) and two counties (New Zealand and the United States) as case studies to show how the model can be fitted to existing interception and establishment data and make predictions about biosecurity effectiveness for different taxonomic groups. Our model furthers the understanding of the relationship between interception and establishment probabilities, which adds to our ability to predict invasions. We can use the model to estimate the number of "unseen arrivals" (i.e., species in a given family that have arrived but have neither been intercepted nor established), which was not an intrinsic feature of previous models.

Stochastic arrival-interception-establishment model
This model describes the arrival events, interception events and establishment events for a set of taxonomically related species, in a given country, over a period of time T (a glossary of key model terms and notation is provided in Box 1). Upon arrival, individuals of a species may be intercepted; but if not intercepted, they may go on to establish (Fig. 1A). The only directly observable variables in the model are the number of interceptions, and whether establishment of a species has been detected. Non-intercepted arrivals are, by definition, not observed. Establishments are typically only observed with a significant time lag because considerable time is typically required before a newly established species is discovered (Crooks 2005). Hence, the main aim of the model is to predict the probability of establishment from the number of observed interceptions of a given species.

Box 1
Model term

Unseen arrivals
Species in a given family that have arrived but have neither been intercepted nor established Arrival event a propagule as defined in Simberloff (2009), in other words, an arrival of one or more individuals of a species at the same time and location T time period over which interception data are collected k j arrival rate of species j, (average number of arrival events per unit time) N A number of arrival events of species j over time T N I number of intercepted arrival events of species j over time T p I probability of an arrival event being intercepted p E the per arrival event probability of establishment, i.e., the conditional probability of establishment given a nonintercepted arrival event N E number of arrival events of species j that established over time T P TE or p(n) probability of species j establishing over time, given n interception events P EX probability of species j establishing over time T, given at least one arrival event, but no interceptions Arrival events of species j are assumed to occur as a Poisson process with fixed rate k j per unit time. Across the set of species under consideration, the arrival event rates are assumed to be independently and identically distributed according to some distribution with probability density function f(k). Candidate arrival rate distributions include uniform and power law distributions. Using a uniform arrival distribution allows for some useful simplified results and given a paucity of data is a reasonable initial assumption. A power law distribution is an example of a heavy tailed distribution, which represents a set of species where the majority have very low arrival rates, but there is a long tail of species with very high arrival rates. This property is seen in many communities of species and in actual interception data (Magurran 2013, Liebhold et al. 2017).
The number of arrival events N A of species j during a time period T is a Poisson random variable N A $ Poissonðk i TÞ: Note that this specifies the distribution of N A for a species with a given arrival rate k j . Because each species has its own value of k j , drawn from the arrival rate distribution, the distribution of arrival frequencies over an ensemble of species will not be Poisson and is likely to be zero-inflated and right-skewed. Each arrival has probability p I of being intercepted so the number of interceptions, N I , of species j during the time period T is a binomial random variable N I $ BinomialðN A ; p I Þ: Each non-intercepted arrival has probability p E of founding an established population. Hence, the number of establishments N E for species j is also a binomial random variable Initially, we assume for simplicity that the per arrival interception and establishment probabilities, p I and p E respectively, are the same for all species in the group and do not change over time. This ignores potential sources of variation among species, such as the strength of Allee effects, and changes in inspection protocols over time as these are not well quantified across a broad range of species and time periods. We investigate the effects of relaxing some of these assumptions in Model Extensions. The model also assumes that data on interceptions and establishments are available for the same time period T. In practice, this is unlikely, and the consequences of this assumption and practical solutions are discussed in Appendix S1 and Table 1.
The probability P TE that at least one arrival event for species j establishes during the time period T is where S E = 1 if the species established and S E = 0 if it did not. We use p(n) to denote P(S E = 1|N I = n), the probability of species establishment given there were n interceptions during the time period T. Conditioning on the species arrival rate k and the number of arrivals N A , and using Bayes' theorem (see Appendix S2 for details), we can write p(n) as showing the five mutually exclusive outcomes for species in a defined set (black box) during a given observation period: (1) did not arrive, (2) arrived but neither intercepted nor established, (3) arrived and intercepted but not established, (4) arrived and established but not intercepted, (5) arrived, intercepted, and established. The probability of establishment p(0) for a species that has not been intercepted is the number of species in area (4) divided by the number of species in areas (1), (2), and (4). The probability of establishment P EX for a species that has arrived but not been intercepted is the number of species in area (4) divided by the number of species areas (2) and (4). [Color figure can be viewed at wileyonlinelibrary.com] p n ð Þ ¼ Eq. 1 requires the estimated distribution of arrivals rates (estimated from the data, illustrated in this paper with the power law, or as a very simplified case, the uniform distribution), the probability of interception (hypothesized, but to which the sensitivity of the results can be investigated), and a per arrival probability of establishment (which we estimated from the data using maximum likelihood).
It is possible for a species to establish without having been intercepted. Hence, establishment risk can be predicted for species with zero interceptions by calculating p (0). This probability of establishment includes in its denominator all species in the chosen taxonomic group, including those that have not arrived during the observation period (i.e., species with very low arrival rates that may not arrive during the observation window currently being modeled, see Fig. 1A, B). If the arrival rate distribution is right-skewed, then there will be many species in this category. In some circumstances, decision makers may only be interested in those species that are likely to have arrived. A more relevant prediction in this case is the probability of establishment for species that have arrived at least once during the observation window, but have not been intercepted, P EX = P(S E = 1|N I = 0 and N A > 0). Following a similar procedure to that used above gives In the special case where the arrival rate distribution is uniform, meaning all arrival rates are equally likely, Eqs. 1, 2 can be simplified to give

Data and model fitting
We used data from border interceptions and establishments in the United States and New Zealand for All establishment data are used.
There is a lag between when an establishment occurs and when it is reported, which can be on the order of decades. This means that the data for the most recent couple of decades will underestimate the number of established species. Insects that established during the earlier decades due to arriving on trade routes would likely establish again, but "new establishments" of already widespread populations are not likely to be detected or recorded.
Establishment probability per arrival will be overestimated, and hence is overestimated. An alternative assumption would be to assume that the arrivals during the interception timeframe would have been arriving at the same rate over the earlier time frame, but this would lead to the same consequence in terms of the ratio of establishment probability per arrival and interception probability per arrival.
All arrival rates are equally likely (under the uniform arrival rate model).
Eqs. 1 and 2 can be simplified to Eqs. 3 and 4.
This simplification leads to equations that are much simpler to fit. This is useful for a quick evaluation of the relationship between probability of establishment and number of interceptions.
The probability of low arrival rates is likely underestimated. This leads to an overestimation of p(0).
There is no variation between species in a taxonomic group in their per arrival interception probability or their per arrival establishment probability.
The per arrival interception probability and per arrival establishment probability are modeled as constant within a taxonomic group.
This is a simplifying assumption as we do not have the data to fit additional variables. However, there is likely to be less variation among species in the same taxonomic group, compared to among distantly related species.
See Cerambycidae and Aphididae insect families (Appendix S1, Data S1, and Data S2) to estimate model parameters. We fitted the Aphididae and Cerambycidae data sets with the stochastic model using two alternative arrival rate distributions, but no additional sources of variability. The additional sources of variability are explored later on in the model extensions section. For comparison, two phenomenological models were also fitted to the data sets.
The three key parameters of the stochastic model are the mean arrival rate E(k), the interception probability p I and the per arrival probability of establishment p E . These three parameters cannot be uniquely identified from data. For example, a group of species with mean arrival rate E(k) = 1 and p I = p E = 0.01 will on average result in practically the same observations as a group with an E(k) = 10 and p I = p E = 0.001. Therefore, it is not sensible to attempt to use data to estimate these model parameters individually. However, provided the interception probability is small (i.e., p I < 1, which is a realistic assumption, the model is insensitive to changes in the value of p I and p E provided E(k)p I and E(k)p E are fixed. We therefore started by fixing a value for the probability of interception p I ; the sensitivity of this choice is tested later.
We compared two candidates for the arrival rate distribution, a uniform distribution and a power law distribution. For the uniform distribution, no fitting was required because the distribution has no parameters. This is equivalent to assuming that all arrival rates k ≥ 0 are a priori equally likely. The uniform distribution is not normalizable over the non-negative real numbers, but this does not affect model output because the normalization constant appears in both the numerator and denominator of Eq. 1. We used Eqs. 3 and 4, respectively, to give the probability of species establishment and the probability of establishment of a species that has arrived but not been intercepted.
The power law distribution has probability density function where k min is the minimum arrival rate, and C is a normalization constant. For observed interception counts n and a given value of the interception probability p I , we used estimated arrival rates k = n/(p I T) to fit the exponent µ via a standard maximum likelihood equation (Newman 2005). We set the minimum arrival rate to be k min = 0.01/(p I T), which corresponds to a species that is intercepted on average once during a time period 100 times longer than the observation period T.
Once the arrival rate distribution was specified, we estimated the value of the per arrival establishment probability p E by fitting the model prediction for the probability of species establishment, p(n), in Eqs. 1 or 3 (for the power law model and the uniform model, respectively) to data using maximum likelihood estimation.
The likelihood of observing a data set x, consisting of interception counts n j and establishment S Ej {0,1}, given a per arrival establishment probability p E is given by lnð1 À pðn j ÞÞ: (5) The value of p E that maximizes the likelihood was found using the fminbnd function in Matlab (The Math-Works, Inc., Natick, MA, USA). The approximate 95% confidence interval (CI) for p E was also calculated as the range of values of p E for which ln(L(x|p E )) ≥ max(ln(L (x|p E )) À 2 (Hudson 1971).
Species with zero interceptions that did not establish during the observation period (S E = 0 and N I = 0) are, by definition, not in the data set. Hence, the species with zero interceptions that are in the data set have an apparent probability of establishing equal to one. A naive fitting procedure would therefore have resulted in attempting to make p(0) = 1. To avoid this, we excluded species with zero interceptions that did establish (S E = 1 and N I = 0) from the data. However, we use this information in combination with model results to make inferences about the likely number of unseen arrivals of each insect family in each country (see Discussion).
The first of the phenomenological models fitted was the basic arrival-establishment model of Leung et al. (2004), which only accounts for demographic stochasticity. When applied to actual arrival data, this model is process based, but when applied to interception data without adaptation, it becomes phenomenological. Unlike our stochastic arrival-interception-establishment model, this model assumes that the number of arrivals of a species with n interceptions is deterministically equal to n/p I . Under this assumption, it can be shown that the probability p(n) of a species with n interceptions having established is : This model is a special case of the Weibull model used in Leung et al. (2004) and Brockerhoff et al. (2014) with the shape parameter set to c = 1. Hereafter, we refer to this as the exponential model. The maximum likelihood estimate and 95% CI for the parameter r was found for each data set via Eq. 5. For any chosen value of p I , the value of p E can be calculated from r for comparison with the stochastic model.
Finally, a logistic regression model for the probability of species establishment was also fitted as a common statistical model for analyzing binary response data. Matlab code for fitting the four models to the data is supplied in Data S3.
In order compare how well the models fit the data, we provide the values of the Akaike information criterion (AIC) for each model. For the purposes of calculating AIC, the number of fitted parameters for the stochastic model was 1 (p E ) and for the exponential model was 1 (r). Although the stochastic model with power law arrival rate distribution has one additional parameter (the power law exponent l), this was fixed using the interception count data and the likelihood in Eq. 5 was maximized over only one parameter (p E ).

RESULTS
We fitted the stochastic arrival-interception-establishment model using each of the two candidate arrival rate distributions and three different assumed values for the probability of interception p I to the United States and New Zealand Cerambycidae and Aphididae data sets. For each model, we report the maximum likelihood estimate and 95% CI for the per arrival probability of establishment p E , the predicted probability of establishment p (0) for species with no interceptions, and the predicted probability of establishment P EX for species that have arrived at least once but have not been intercepted ( Table 2). Note that the CIs account for variability in arrival, interception and establishment frequencies associated with the stochastic model, but do not allow for other sources of uncertainty, such as measurement errors, lag in establishment detection, Allee effects, or variations in parameters between species or over time. The stochastic model is compared with two alternative  Table 2. Probability of establishment was calculated from the data in logarithmic bins (black crosses). [Color figure can be viewed at wileyonlinelibrary.com] models: the exponential model and a logistic regression model.
Fitted values for the per arrival establishment probability p E are strongly correlated with the assumed value for the interception probability p I . This is because, as explained above, interception count data of the type used here are insufficient to identify these two parameters independently. Fitted values for p E under the stochastic model with a uniform arrival rate distribution are similar to those under the exponential model with the same assumed value for p I . Under the power law arrival rate distribution, fitted values for p E are higher.
For Cerambycidae in the USA and New Zealand, the exponential model and the stochastic model with uniform arrival rate distribution provide an almost equally good fit (difference in AIC < 2). The stochastic model with power law arrival rate distribution fits slightly less well (difference in AIC < 4). For Aphididae in the United States and New Zealand, the logistic regression is the best-fitting model (lowest AIC), followed by the stochastic model with uniform arrival rate distribution. Fig. 2 shows the species establishment probability as a function of the number of interceptions for each data set. The shape of these graphs, except for species with very low (<2) interception counts, is not sensitive to the choice of value for p I or of arrival rate distribution. The stochastic models predict non-zero values for the species establishment probability, even for species with zero interceptions (vertical axes intercepts in Fig. 2, and p(0) in Table 2). This contrasts with the exponential model, which assumes that the number of arrivals is directly proportional to the number of interceptions, and therefore that species with no interceptions cannot have established. The stochastic model with power law arrival rate distribution predicts that the probability p(0) of a species establishing without interception is between 0.00008 and 0.00039 for Cerambycidae and between 0.01 and 0.05 for Aphididae. The predicted value of p(0) and the AIC for the stochastic models are insensitive to the choice of interception probability p I provided it is less than approximately 0.1.
The stochastic model predictions for the probability of establishment P EX for species that have arrived but not been intercepted can be used to estimate the number of unseen arrivals. As an accurate value of the interception probability is rarely available, these results are best used as relative estimates. Figs. 3 and 4 show P EX as a function of the probability of interception p I and the per arrival probability of establishment p E assuming a uniform arrival rate distribution, calculated via Eq. 4, or a power law distribution, calculated via Eq. 2, respectively. Superimposed on Fig. 3 are curves showing the predicted value of P EX for the four data sets, as a function of the assumed value for p I . For Cerambycidae, P EX is consistently small (≤0.01) both for the United States and New Zealand, implying that, for every species that has Estimating the number of unseen arrivals using a uniform arrival rate distribution. Probability of establishment P EX (represented by the color bar) for species that have arrived at least once but not been intercepted during the observation period, assuming a uniform arrival rate distribution (Eq. 4). If P EX is small, then for every species that has established without being intercepted, there are many more species that have breached border biosecurity but not yet established. Superimposed curves show the predicted values of P EX for the Cerambycidae data (white) and Aphididae data (red) in the United States (solid) and New Zealand (dashed), as a function of the assumed value for the probability of interception p I and assuming a uniform arrival rate distribution.
[Color figure can be viewed at wileyonlinelibrary.com] established without being intercepted, there are at least 100 unseen species that have arrived. For Aphididae in the United States, assuming p I is less than approximately 0.2 and assuming a uniform arrival rate, P EX % 0.15, meaning that for every species that established without interception, there are about six unseen arrivals. For New Zealand, P EX % 0.25 meaning that for every species that established without interception, there are about three unseen arrivals. If p I > 0.2, the predicted values of P EX are higher, meaning the number of unseen Aphididae species, which have arrived would be lower than the above estimates.

Stochastic model extensions
The stochastic model can be extended to include variation among species or an Allee effect. We model variation among species by drawing the probabilities of interception p I and establishment p E for each species independently from beta distributions with shape parameter a = 0.5. The scale parameter b is chosen to keep the mean of each distribution the same as the fitted value shown in Table 2.
We model an Allee effect by making the per arrival establishment probability p E dependent on the recent arrivals of conspecifics. Specifically, the probability of establishment for a single non-intercepted arrival at time t is either zero if there were no other non-intercepted arrivals between time t À a and time t, or p E if there was at least one non-intercepted arrival between time t À a and time t. The parameter a is a constant specifying the strength of the Allee effect: the smaller a is, the stronger the Allee effect and the lower the probability of establishment, especially for species with low arrival rates. . Estimating the number of unseen arrivals using a power law arrival rate distribution. Probability of establishment P EX (represented by the color bar) for species that have arrived at least once but not been intercepted during the observation period, assuming a power law arrival rate distribution (Eq. 2). If P EX is small, then for every species that has established without being intercepted, there are many more species that have breached border biosecurity but not yet established. Superimposed curves show the predicted values of P EX for the Cerambycidae data (white, A and B) and Aphididae data (red, C and D) in the United States (solid, A and C) and New Zealand (dashed, B and D), as a function of the assumed value for the probability of interception p I and assuming a power law arrival rate distribution. [Color figure can be viewed at wileyonlinelibrary.com] probability of species establishment for a given interception count. This weakens the overall relationship between interception frequency and risk of establishment, even if there is only variance in one of the two parameters. Including an Allee effect in the model decreases the probability of species establishment for a given value of the per arrival establishment probability p E . This means that if an Allee effect is present but not accounted for, p E will be underestimated.

Model comparisons
We have developed a process-based, stochastic model of the arrival, interception and establishment of exotic species. This is treated as a three-stage process based on the probabilities of (1) arrival at the border; (2) interception by inspectors; and (3) establishment of a viable population. This approach explicitly acknowledges uncertainty arising because interceptions represent only a small sample of all the actual arrival events. The model outputs the probability of species establishment as a function of the number of recorded interceptions.
We have fitted the model to data on interception counts and establishments from the United States and New Zealand for species in two insect families, Cerambycidae and Aphididae. The stochastic model's goodness-of-fit to the data is comparable to the exponential model. This model is equivalent to the Weibull (c = 1) non-Allee model of Leung et al (2004) and infers similar values for the per arrival probability of establishment. This contrasts with the logistic regression model, which lacks the ability to interpret parameters in this way. Our stochastic model offers two key advantages over the exponential model. Firstly, the model is process-based in its construction meaning that model parameters correspond to probabilities of certain classes of events  Table 2; a = 0.01 yr (weak Allee effect), a = 0.001 yr (strong Allee effect). [Color figure can be viewed at wileyonlinelibrary.com] occurring, which are in principle measurable, allowing for future development. Secondly, because of its processbased construction, the model can provide predictions for the probability of species establishing without having been detected or intercepted at the border. This contrasts with the exponential model, which assumes that the number of arrivals is a deterministic multiple of the number of interceptions, and hence, species with no interceptions have zero probability of establishment. Our model framework is therefore better aligned with actual interception-establishment data, which contain frequent instances of species establishing without having been detected at the border. Model adjustments in Brockerhoff et al. (2014) improved the Weibull (Allee inclusive) model to account for the problem of predicting zero establishments from zero detections. In that case, numbers of non-intercepted species were added using assumptions based on frequency abundance models along with a very small frequency of "interception." However, our model is the first time that probability of establishment for non-intercepted species has been incorporated into a model of the arrival-interceptionestablishment process from first principles. In particular, for the assumption of uniformly distributed arrival rates, this results in a very simple but effective model for qualitative comparisons.
We used the model to explore the effect of the interception probability and per arrival establishment probability on the relationship between interception counts and species establishment risk. The ratio of these probabilities is the main determinant of the shape of the relationship. We tested two different species arrival rate distributions: a uniform distribution and a power law distribution. Direct data on species arrival rates are rarely available and, in particular, the left-hand tail of the arrival rate distribution is difficult to estimate because the majority of species arrive very rarely (Liebhold et al. 2017). Better data on species arrival rates will improve the quantitative accuracy of model predictions of establishment risk for a given interception count but are unlikely to qualitatively change model behavior. This means that either of the arrival rate distributions can be used to assess the broad scale relative differences between taxonomic groups.
While the uniform arrival rate distribution has the advantage of a simpler fitting process, the distribution of arrival rates is likely to be right skewed (Liebhold et al. 2017). Many known distributions of species abundances are right skewed (Magurran 2013), and the power law distribution fits the interception data better than the uniform distribution. This means that the uniform arrival rate model will overestimate P EX and hence underestimate the number of unseen arrivals. An alternative option would be to fit the model in a Bayesian framework using simulations and allowing further exploration of the uncertainty in the model, and we leave this open for future investigation. Although a Bayesian approach would have the benefit of simultaneously fitting the arrival rate distribution and the stochastic arrival-interception-establishment model, it would be a more computationally expensive method and would not provide any further analytical insight into the relationship between interception frequency and probability of establishment.

Taxonomic group comparisons: Cerambycidae and Aphididae
The model results (Fig. 2) show that, for Cerambycidae, the establishment likelihood is very low (classified as 0.001-0.05) for species with small interception counts, and moderate (classified as 0.3-0.7) for species with high interception counts (using the likelihood level descriptors as in Australian Department of Agriculture 2014). The low establishment likelihood for low interception rates may be explained by Cerambycidae having a small per arrival establishment probability relative to interception probability. These low establishment probabilities may be caused by Allee effects (Taylor andHastings 2005, Liebhold andTobin 2008), or variability in establishment probability among species (Fig. 4). Establishment probabilities may have been higher if we had used worldwide establishment status instead of the single country establishment status, as analyzed in Brockerhoff et al. (2014). For Aphididae, there is a moderate establishment likelihood for species with low interception counts, and a high likelihood (classified as 0.7-1.0) for species with high interception counts. This could be explained by a lower ratio of interception probability to per arrival establishment probability compared to the Cerambycidae. The interception data analyzed here span a shorter time period than that over which establishment data were collected, contributing to the low interception probability. On the other hand, the more likely primary reason is that Aphididae tend to have highly efficient reproductive strategies (i.e., many are parthenogenic), which contributes to high per arrival establishment probabilities (Teulon and Stufkens 2002). Using the model to predict establishment likelihood of Aphididae species based on interceptions would result in low sensitivity (there would be many species that establish without ever being intercepted), but many of the most frequently intercepted would establish.
For Cerambycidae, the model predicts a very low probability of species establishing without interception. However, for Aphididae, the model predicts a probability of establishing without interception over the relevant observed time period (71 yr in the United States and 18 yr in NZ) of 14% in the United States and 21% in New Zealand, assuming a uniform arrival rate distribution. If, however, we assume that the arrival rate distribution is right-skewed and use the power law model, then the p(0) estimates are much lower. For either model, these estimates are reasonably robust to varying the assumed value for the interception probability, but the interpretation is influenced by the difference in the timeframes for the interception data and establishment data. The corresponding P EX predictions can be used to estimate the number of "unseen arrivals," i.e., species that have breached border biosecurity without being intercepted but have not yet established. However, these predictions do depend on the choice of model and the assumed value of p I (Figs. 3 and 4). The uniform model tends to give higher P EX predictions compared to the power law model, but the power law model may be a more reasonable description of the arrival rate distribution. For Cerambycidae, the stochastic model consistently predicts that P EX is small (<0.01) although the exact value depends on the interception probability. This implies that, for every species that established without being intercepted, there were hundreds or even thousands of species that arrived unseen, although most of these species would have arrived very rarely. There were five non-intercepted established species in the United States (Data S1). The power law model, with an interception probability of 0.01, predicts that P EX = 0.00019, and therefore, an estimated 26,000 unseen species arrived in the United States, which is higher than the estimate given in Brockerhoff et al. (2014).
For Aphididae, the uniform model, assuming an interception probability <0.1, predicts that there were four to eight unseen arrivals in the United States for every species that established without interception, and two to seven in New Zealand. However, these estimates are sensitive to model assumptions: for example, if the probability of interception is lower or if the arrival rate distribution is heavily right skewed, the predicted number of unseen arrivals would be higher. Note that while the predicted number of Cerambycidae or Aphididae unseen arrivals depends strongly on the combination of model assumptions, the qualitative trend of a higher ratio of the number of unseen arrivals per non-intercepted established species for Cerambycidae compared to Aphididae remains consistent.

Model limitations
Modeling requires assumptions to be made, and the consequence of those assumptions may be more or less significant depending on the context in which model predictions are used. Many of the assumptions made in this paper do not affect the probability of establishment relative to other species, which makes the simple-to-use uniform arrival rate model suitable for comparing taxonomic groups. However, as noted, several of the assumptions result in an overestimation of p(0) and P EX , including using a uniform arrival rate distribution instead of a right-skewed distribution and assuming the establishment timeframe is the same as the (often shorter) interception timeframe (Table 1). In addition, arrival rates are likely to change through time, increasing with trade volume, or decreasing with effective phytosanitary measures. Trying to fit one source of variation while ignoring all others, in the absence of further data or controls, may lead to erroneous conclusions.
The model fit to the data could be influenced by variation in interception probability or by variation in per arrival event establishment probability among species, or by variation in Allee effects among species. Unfortunately, there is insufficient data to reliably fit all these effects. This is illustrated in the fitting of the Allee-inclusive Weibull model in Brockerhoff et al. (2014). Prior to correcting for measurement error using the SIMEX method, the Allee parameter (c) absorbs all sources of variation resulting in a fitted value, which is not biologically reasonable. Applying the SIMEX correction shifts the fitted Allee parameter to a more reasonable value. However, variation in per arrival establishment probability, which can influence the Allee parameter fit (Duncan et al. 2014), is still not explicitly accounted for. The SIMEX method accounts for some of the variation in interception probability but requires an estimate of the error in the interception data. In Brockerhoff et al. (2014), this estimate comes from the variation between the United States and New Zealand data sets. However, this method will miss variation where the two countries have similar biases, for example, similar pest priorities or biological characteristics that make some insects easier to detect than others. Using border interception data also fails to capture the arrival frequencies for species arriving on non-human assisted pathways, such as natural dispersion across land borders into the United States, and wind-assisted dispersion from Australia into New Zealand (Close et al. 1978).

Future development and applications
Although useful for qualitative predictions, the data available to us for this study were inadequate for detailed modeling and making quantitative predictions with a high level of certainty. Our stochastic model's predictive ability could be improved by including additional information in the model to account for sources of variation in interception probability and individual per arrival establishment probability. Additional information could include comprehensive slippage surveys to assess the effectiveness of current border surveillance (Whyte 2006) and climate suitability as assessed by environmental distance metrics (Phillips et al. 2018). Ideally, variation in interception probability would be controlled and monitored by statistically designed border surveillance programs such as those discussed in Saccaggi et al. (2016). Some information already exists about the relationship between numbers of shipments inspected and the probability of pest interception (Work et al. 2005), but systematically collected estimates for the probability interception per arrival and its variation across all pathways for specific taxonomic groups would enable additional model refinement. Furthermore, increasing the intensity or efficiency of inspection could be used to increase the probability of interception, and our model predicts that this would provide greater power in differentiating species with low vs. high probability of establishment. We acknowledge that increasing interception probability per arrival may be difficult given the vast quantity of arriving imports, but an attractive alternative may be to increase support for diagnostics (pest identification) so that more detections can be identified to species level. If it is not practical to apply a consistent statistical sampling methodology over long time periods (e.g., decades), then well-documented changes in methodology, and changes in estimates for key parameters, would be useful. Ultimately, models that account for costs associated with various intensities of inspection and compare these with benefits of invasion forecasts could be used to identify optimal intensities of inspection (Surkov et al. 2008).
Border inspection plays a crucial role in plant biosecurity programs (Magarey et al. 2009). While some of the benefit of inspection comes from direct prevention of entry by potentially invading pests, these direct benefits are likely to be small in most situations because of the relatively small fraction of arrivals that are actually intercepted. Indirect benefits of inspection are thought to be much greater, including the deterrent effect in which behavior of importers can be changed from knowledge that discovery of contaminated shipments could result in refusal, destruction or fines (Springborn et al. 2016). The other indirect benefit of border inspections is the information that it provides about pests associated with specific shipments (Kenis et al. 2007). Models, such as those developed here, can be used to predict probabilities of pest establishment associated with specific commodity imports and such predictions are of critical use in risk assessments that guide quarantine policy, including the imposition of phytosanitary measures. Estimates for the relative number of unseen arrivals in different taxonomic groups could be combined with risk assessments to inform resource prioritization by biosecurity officials. Predictions of establishment probabilities can also help to identify high-risk pests and direct surveillance efforts accordingly (Colunga-Garcia et al. 2013).
While the impact component of risk also needs to be taken into account, the contrasting qualitative behavior of the relationship between interceptions and establishments for Aphididae and Cerambycidae suggests the following recommendations. If the probability of a species establishment given no interceptions is relatively high, and the number of unseen species is also large, as is the case here for Aphididae, then this means that there will be little warning from the border about potential new establishments. A biosecurity manager could then take steps to increase the overall interception probability in order to be forewarned or increase general post-border surveillance and general response plans to be able to respond quickly to an unknown incursion. It is also possible that the probability of a species establishment given no interceptions is high because interception probability is uneven across the family and some introductory pathways or species are being under sampled during inspections.
On the other hand, if the number of unseen species is large, but the probability of a species establishment given no interceptions is low, as is the case here for Cerambycidae, then it may be better to focus on the known potentially arriving species and look to further knowledge about what causes the variability in the per arrival establishment probability among species. This would be recommended alongside continued collection of interception data, as changes can occur.
While the models developed here provide considerable benefit to risk assessments and other practices in the design of biosecurity strategies, it is likely that more advanced models could be developed in the future that would provide more quantitative value when additional data are available for further parameterization.