Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms

Objective: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. Methods: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal–Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). Results: The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. Conclusions: RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods