All tools are accessible as Apps in the CyVerse Discovery Environment (formerly iPlant). The CyVerse Cyberinfrastructure is a freely available resource for computation, storage, and data analysis for the life sciences. We plan to extend the list of tools for viruses pending funding.  The are also general Apps for metagenomics and microbial ecology available through the iMicrobe Project.

The lists below include virus-focused tools (either available through the iVirus project by others) and tools not specifically built for viruses but can be applied to viral metagenomic analyses.

Clicking on tools will take you to the App on CyVerse!

iMicrobe/iVirus Tools

Quality Control

Once you’ve uploaded your read data, you’ll need to QC (“Quality Control”) it. This ensures that the data going into the assembly (the next step) is of high quality. Poor read quality can result in mis- or incorrectly assembled sequences. Most frequently, read data QC involves trimming reads according to their quality scores. Although some assemblers do not require QC’d reads, we highly recommend it! 

Name Description
Trimmomatic Identifies adapter sequences and quality filters
Btrim Trims adapters and low quality regions
Scythe Identifies contaminating sequences in read data based on a Bayesian approach
Sickle Sliding window quality trimmer, designed to be used after Scythe

Gene Calling

Name Description
MetaGeneMark Ab initio gene prediction
FragGeneScan Ab initio gene prediction
Prodigal Ab initio gene prediction

Assemblers

Following read trimming and QC, reads can now be assembled into contiguous sequences (“contigs”). Most “recent” assemblers are designed to assemble Illumina data (short read lengths, massively deep sequencing) and are based on De Bruijn graphs (original ref). Assembler selection is dependent on the type of read data being assembled (often 454 vs Illumina vs Pacbio), source material (DNA vs. RNA, eukaryotic vs prokaryotic) and/or sample-specific determinants that may have biased the reads (high/low coverage, repetitive sequences, amplification polymerase, etc.). There is no “best” assembler, though there are assemblers that perform better with viral metagenomes than others.

Name Description
SOAPDenovo Single-genome assembler tuned for metagenomics
Newbler (gs Assembler) De novo assembly based on read overlap
SPAdes (multiple memory) De Bruijn graph assembler
IDBA-UD (multiple memory) De Bruijn graph multiple alignments
Trinity (multiple memory) RNA-Seq De novo assembler

Viral Analysis

Analyzing viral data remains a major challenge in the field of viral ecology. A variety of approaches have been proposed, each dependent on the source of data and the underlying biological question. A relatively recent method of analyzing complex viral data is by organizing viral sequence space, often through the use of protein clustering techniques. Protein clusters can be used as a diversity metric, or as units for ecological studies when compared against other datasets, or functional profiling of the community.

Name Description
PCPipe Protein clustering pipeline and annotation
VirSorter Find viral contigs in a microbial metagenome (reference)
vContact Guilt-by-contig-association automatic classification of viral contigs
vContact-PCs Generate PC-profiles using vContact/MCL
vContact-Gene2Contig Conditions files for use in vContact
GAAS (Genome Abundance and Average Size) (In development) Estimates relative abundance and average size of metagenomic sequences
Circonspect (In development) Generates contig spectra for downstream modeling of community structure
PHACCS (Control In Research on CONtig SPECTra) (In development) Estimates structure and diversity of viral communities
BatchBowtie Performs mass alignment of paired and unpaired reads against a reference dataset using Bowtie2 and Samtools.
Read2RefMapper Consumes input from BowtieBatch to generate coverage profiles.

Annotations

Name Description
Prokka software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files

iVirus tool updates (both improvements and bugs) will be worked on pending funding and time availability.