All tools are accessible as Apps in either the CyVerse Discovery Environment or in KBase's App Catalog. We plan to extend the list of tools for viruses as long as we continue to receive funding (and sometimes beyond). We’ve also included more generalized apps for metagenomics and microbial ecology available through the iMicrobe Project.

Below is a list of apps we've used in an iVirus protocol or used successfully with viral data. We'll do our best to keep this updated as frequently as time allows, though feel free to contact us if there’s any mistakes or omissions.

Assembly

Gene Calling / Annotation

Sequence Search

Viral Identification

Viral Analysis

Read-Based Analysis

Quality Control (QC)

Generally speaking, quality control (QC) is a technique most commonly applied to raw read data. This ensures that the data going into the assembly (common next step) is of high quality. Poor read quality can result in mis- or incorrectly assembled sequences. Most frequently, read data QC involves trimming reads according to their quality scores and removing barcoding sequences (if applicable). Although some assemblers do not require QC’d reads, we highly recommend it!

Sickle

CyVerse Link KBase Link Official Website DOI

Reference:

Joshi NA, Fass JN. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle.

Scythe

CyVerse Link KBase Link Official Website DOI

Reference:

Buffalo V. Scythe - A Bayesian adapter trimmer (version 0.994 BETA) [Software]. Available at https://github.com/vsbuffalo/scythe

Btrim

CyVerse Link KBase Link Official Website DOI

Reference:

Kong, Y. (2011) Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies. Genomics. DOI: 10.1016/j.ygeno.2011.05.009

Trimmomatic

CyVerse Link KBase Link Official Website DOI

Reference:

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.

FastQC

CyVerse Link KBase Link Official Website DOI

Reference:

Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Assembly

Once reads have passed quality control and are 'cleaned', the next usual step is assemble them. Since reads are fragments of a longer DNA template, assembly attempts to piece back together the original DNA sequence from the short-reads. This process is called assembly, and results in commonly called 'contigs' - or contiguous sequences - that represent a larger piece of DNA from the original DNA library. Multiple assemblers are available, and have different methods and algorithms to piece back together contigs. The choice of assembler can also depend on the complexity of the genome, as well as the type of organism. For viruses, SPAdes or MetaSPAdes have yielded good results.

MetaSPAdes

CyVerse Link KBase Link Official Website DOI

Reference:

Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).

SPAdes

CyVerse Link KBase Link Official Website DOI

Reference:

Bankevich, A. et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 19, 455–477 (2012).

‍

IDBA-UD

CyVerse Link KBase Link Official Website DOI

Reference:

Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428.

SOAPdenovo2

CyVerse Link KBase Link Official Website DOI

Reference:

Luo et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 2012 1:18.

Gene Calling and Annotation

Post assembly, gene prediction is the next step. Genes determined from a gene prediction tool can be fed into numerous sequence analysis tools.

Prodigal

CyVerse Link KBase Link Official Website DOI

Reference:

Hyatt, D. Prodigal (2.6.3) [Software]. Available at https://github.com/hyattpd/Prodiga

Prokka

CyVerse Link KBase Link Official Website DOI

Reference:

Seemann T. Prokka: rapid prokaryotic genome annotation Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063

Sequence Search

Once genes are called (and sometimes that's not required), the real "fun" of analyzing viral sequence data begins. The tools featured here aren't virus-specific, but they're often used with viral data.

Diamond

CyVerse Link KBase Link Official Website DOI

Reference:

B. Buchfink, Xie C., D. Huson, “Fast and sensitive protein alignment using DIAMOND”, Nature Methods 12, 59-60 (2015)

‍

Viral Identification

Analyzing viral data remains a major challenge in the field of viral ecology. A variety of approaches have been proposed, each dependent on the source of data and the underlying biological question. A relatively recent method of analyzing complex viral data is by organizing viral sequence space, often through the use of protein clustering techniques. Protein clusters can be used as a diversity metric, or as units for ecological studies when compared against other datasets, or functional profiling of the community.

VirSorter2

CyVerse Link KBase Link Official Website DOI

Reference:

Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021).

VirSorter

CyVerse Link KBase Link Official Website DOI

Reference:

Roux S, Enault F, Hurwitz BL, Sullivan MB. (2015) VirSorter: mining viral signal from microbial genomic data. PeerJ 3:e985

VIBRANT

CyVerse Link KBase Link Official Website DOI

Reference:

Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).

MARVEL

CyVerse Link KBase Link Official Website DOI

Reference:

Amgarten, D., Braga, L. P. P., da Silva, A. M. & Setubal, J. C. MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins. Front. Genet. 9, 1–8 (2018).

MArVD

CyVerse Link KBase Link Official Website DOI

Reference:

Vik, D. R. et al. Putative archaeal viruses from the mesopelagic ocean. PeerJ 5, e3428 (2017).

DeepVirFinder

CyVerse Link KBase Link Official Website DOI

Reference:

Ren, J. et al. Identifying viruses from metagenomic data by deep learning. (2018).

Viral Analysis

vConTACT2-Gene2Genome

CyVerse Link KBase Link Official Website DOI

Reference:

Bin Jang, H., Bolduc, B., Zablocki, O., Kuhn, J. H., Roux, S., Adriaenssens, E. M., … Sullivan, M. B. (2019). Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nature Biotechnology.

‍

vConTACT2

CyVerse Link KBase Link Official Website DOI

Reference:

‍

vConTACT-PCs

CyVerse Link KBase Link Official Website DOI

Reference:

Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).

vConTACT

CyVerse Link KBase Link Official Website DOI

Reference:

Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).

DRAM-v

CyVerse Link KBase Link Official Website DOI

Reference:

Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48, 8883–8900 (2020).

Cenote-Taker2

CyVerse Link KBase Link Official Website DOI

Reference:

Tisza, M. J., Belford, A. K., Domínguez-Huerta, G., Bolduc, B. & Buck, C. B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 7, 1–12 (2021).

Cenote-Taker

CyVerse Link KBase Link Official Website DOI

Reference:

Tisza, M. J. et al. Discovery of several thousand highly diverse circular DNA viruses. Elife 9, 1–26 (2020).

Read-Based Analysis

Analyses based on reads can be used for a variety of different reasons. Principle among them is estimating genome or population abundance.

Read2RefMapper

CyVerse Link KBase Link Official Website DOI

Reference:

Bolduc, B., Youens-Clark, K., Roux, S., Hurwitz, B. L. & Sullivan, M. B. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure. ISME J. 11, 7–14 (2017).

BowtieBatch

CyVerse Link KBase Link Official Website DOI

Reference:

Go to Top

Apps and Tools

Quality Control (QC)

Sickle

Reference:

Scythe

Reference:

Btrim

Reference:

Trimmomatic

Reference:

FastQC

Reference:

Assembly

MetaSPAdes

Reference:

SPAdes

Reference:

IDBA-UD

Reference:

SOAPdenovo2

Reference:

Gene Calling and Annotation

Prodigal

Reference:

Prokka

Reference:

Sequence Search

Diamond

Reference:

Viral Identification

VirSorter2

Reference:

VirSorter

Reference:

VIBRANT

Reference:

MARVEL

Reference:

MArVD

Reference:

DeepVirFinder

Reference:

Viral Analysis

vConTACT2-Gene2Genome

Reference:

vConTACT2

Reference:

vConTACT-PCs

Reference:

vConTACT

Reference:

DRAM-v

Reference:

Cenote-Taker2

Reference:

Cenote-Taker

Reference:

Read-Based Analysis

Read2RefMapper

Reference:

BowtieBatch

Reference: