NASCUP: Nucleic Acid Sequence Classification by Universal Probability

16 Nov 2015  ·  Sunyoung Kwon, Gyuwan Kim, Byunghan Lee, Jongsik Chun, Sungroh Yoon, Young-Han Kim ·

Motivated by the need for fast and accurate classification of unlabeled nucleotide sequences on a large scale, we developed NASCUP, a new classification method that captures statistical structures of nucleotide sequences by compact context-tree models and universal probability from information theory. NASCUP achieved BLAST-like classification accuracy consistently for several large-scale databases in orders-of-magnitude reduced runtime, and was applied to other bioinformatics tasks such as outlier detection and synthetic sequence generation.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here