Two-stage Hypothesis Tests for Variable Interactions with FDR Control

31 Aug 2022  ·  Jingyi Duan, Yang Ning, Xi Chen, Yong Chen ·

In many scenarios such as genome-wide association studies where dependences between variables commonly exist, it is often of interest to infer the interaction effects in the model. However, testing pairwise interactions among millions of variables in complex and high-dimensional data suffers from low statistical power and huge computational cost. To address these challenges, we propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction. Theoretically, the difficulty in the FDR control dues to the data dependence among test statistics in two stages, and the fact that the number of hypothesis tests conducted in the second stage depends on the screening result in the first stage. By using the Cram\'er type moderate deviation technique, we show that our procedure controls FDR at the desired level asymptotically in the generalized linear model (GLM), where the model is allowed to be misspecified. In addition, the asymptotic power of the FDR control procedure is rigorously established. We demonstrate via comprehensive simulation studies that our two-stage procedure is computationally more efficient than the classical BH procedure, with a comparable or improved statistical power. Finally, we apply the proposed method to a bladder cancer data from dbGaP where the scientific goal is to identify genetic susceptibility loci for bladder cancer.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods