snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

5 May 2021 · Christina Vasilopoulou, Benjamin Wingfield, Andrew P. Morris, William Duddy ·

Motivation: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of disparate bioinformatics tools. Results: To provide an automated solution that retains comprehensive quality checks and flexible workflow architecture, we have developed snpQT, a scalable, stand-alone software pipeline, offering some 36 discrete quality filters or correction steps, with plots before-and-after user-modifiable thresholding. This includes build conversion, population stratification against 1,000 Genomes data, population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used and users need not be superusers nor have any prior coding experience. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset. Availability: snpQT is open source and freely available at https://github.com/nebfield/snpQT Contact: Vasilopoulou-C@ulster.ac.uk, w.duddy@ulster.ac.uk

PDF Abstract