Multi-resolution modeling of a discrete stochastic process identifies cusses of cancer
Detection of cancer-causing mutations within the vast and mostly unexploredhuman genome is a major challenge. Doing so requires modeling the backgroundmutation rate, a highly non-stationary stochastic process, across regions of interestvarying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of classical Poisson-Gamma formulation,to model a discrete stochastic process at multiple resolutions. We demonstrate thatthe probability model has a closed-form posterior, enabling efficient and accuratelinear-time prediction over any length scale after the parameters of the modelhave been inferred a single time. We apply our framework to model mutationrates in tumors and show that model parameters can be accurately inferred fromhigh-dimensional epigentic data using a convolutional neural network, Gaussianprocess, and maximum-likelihood estimation. Our method is both more accurateand more efficient than existing models over a large range of length scales. Wedemonstrate the usefulness of multi-resolution modeling by detecting genomicelements of vastly differing sizes that drive tumor emergence including genes,regulatory structures, and individual base-pairs.
PDF Abstract