Multi-resolution modeling of a discrete stochastic process identifies cusses of cancer

Detection of cancer-causing mutations within the vast and mostly unexploredhuman genome is a major challenge. Doing so requires modeling the backgroundmutation rate, a highly non-stationary stochastic process, across regions of interestvarying in size from one to millions of positions. Here, we present the split-Poisson-Gamma (SPG) distribution, an extension of classical Poisson-Gamma formulation,to model a discrete stochastic process at multiple resolutions. We demonstrate thatthe probability model has a closed-form posterior, enabling efficient and accuratelinear-time prediction over any length scale after the parameters of the modelhave been inferred a single time. We apply our framework to model mutationrates in tumors and show that model parameters can be accurately inferred fromhigh-dimensional epigentic data using a convolutional neural network, Gaussianprocess, and maximum-likelihood estimation. Our method is both more accurateand more efficient than existing models over a large range of length scales. Wedemonstrate the usefulness of multi-resolution modeling by detecting genomicelements of vastly differing sizes that drive tumor emergence including genes,regulatory structures, and individual base-pairs.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here