Empirical Bayes estimation for random dot product graph representation of the stochastic blockmodel
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Network models are increasingly used to model datasets that involve interacting units, particularly random graph models where the vertices represent individual entities and the edges represent the presence or absence of a specified interaction between these entities. Finding inherent communities in networks (i.e. partitioning vertices with a more similar interaction pattern into groups) is considered to be a fundamental task in network analysis, which aids in understanding the structural properties of real-world networks. Despite a large amount of research on this task since the emergence of graphical representation of relational data, this still remains a challenge. In particular, within the statistical community, the use of the stochastic blockmodel for this task is currently of immense interest. Recent theoretical developments have shown that adjacency spectral embedding of graphs yields tractable distributional results. Specifically, a random dot product graph formulation of the stochastic blockmodel provides a mixture of multivariate Gaussians for the asymptotic distribution of the latent positions estimated by adjacency spectral embedding. The first part of this thesis seeks to employ this new theory to provide an empirical Bayes model for estimating block memberships of vertices in a stochastic blockmodel graph. Posterior inference is conducted using a Metropolis-within-Gibbs algorithm. Performance of the model is illustrated through Monte Carlo simulation studies and experimental results on a Wikipedia dataset. Results show performance gains over other alternative models that are considered. Instead of a complete classification of vertices via community detection, one may wish to discover whether vertices possess an attribute of interest. Given that this attribute is observed for a few vertices, the goal is to find other vertices that possess that same attribute. As an example, if a few employees in a company are known to have committed fraud, how can we identify others who may be complicit? This is a special case of community detection, known as vertex nomination, which has recently grown rapidly as a research topic. The second part of this thesis extends the empirical Bayes model for vertex nomination based on information contained in the graph structure. This yields promising simulation results as well as real-data results from an Enron email dataset. Recent studies have shown that information pertinent to vertex nomination exists not only in the graph structure but also in the edge attributes (Coppersmith and Priebe, 2012; Suwan et al., 2015). This motivates the third part of this thesis by further extending the model to exploit both graph structure and edge attributes for vertex nomination. Simulation studies confirm the benefit of doing so. However, the same benefit is not observed when the model is applied to the Enron email dataset; further investigations suggest that this is due to the data violating one of the model assumptions.