An Overview of the Tools

One of the strengths of iVirus (thanks to its underlying CyVerse cyberinfrastructure) is a focus on bringing bioinformatic tools to the viral ecology community. Here are a few examples of using iVirus/CyVerse available Apps to process data:

A few quick notes:

  • Guides are not intended to assist users in understanding the biology behind the tools nor how the tools function.
  • Where possible, Apps have links to their documentation on CyVerse as well as their citations (or original home pages).
  • In some cases, many Apps are available to solve a particular problem. Guides will choose to highlight one or two.
  • These guides assume you’ve created an CyVerse account and can access your account. Check out the getting started guide for assistance.

Guides/Use Cases

Several “use cases” are available at For nearly all these use cases, we’ll use (as a basis) actual reads from the Ocean Sampling Day (2014) and process them using Cyverse. In some cases we’ll take the user from using raw read files to assembly to identifying viral sequences and preliminary analysis. Other use cases will tackle ways of analyzing a viral metagenome, either reads or contigs, using traditional and non-traditional approaches. As a reminder, all these protocols are on and should be considered the most up-to-date versions.

All example files can be found within the Cyverse datastore. To find these files, login to the Discovery Environment. Under “Data”, go to Community Data –> iVirus –> ExampleData. Alternatively, you can copy-and-paste the following into the “Viewing” bar under the data browser: /iplant/home/shared/iVirus/ExampleData/

All tools have “Input” and “Output” directories, so not only does the user have valid input data, but also the expected output data as well.

Processing a Viral Metagenome

Description: A long-standing challenge in viral metagenomics is actually processing a viral metagenome (we’re not talking about the science side!). For many reasons enumerated elsewhere, processing these datasets requires skilled bioinformaticians and computational resources not available to many researchers/labs. iVirus seeks to tackle this head-on.

Protocol “Collection” (collections are just that – collections of protocols)

Individual Steps:

Mapping Metagenomic Reads to References

Description: One of the most commonly used procedures for analyzing viral metagenomic data is to map their reads (or reads from another dataset) against a set of references, often those from the read assembly. For example, if one wanted to know how well-represented viruses in NCBI’s Viral Reference Sequences (ViralRefSeq) were in ocean viromes, they could map reads from lots of ocean viral metagenomes against ViralRefSeq. This is generally done using Bowtie2 or BWA, by selecting a reference set of sequences, and then providing paired or unpaired reads to Bowtie2/BWA. Then the results must be processed/filtered to generate coverage tables. Dealing with setting up multiple reads files (10 paired metagenomes = 10 alignment runs) and the processing those read files can be challenging (not to mention computational resources).


Individual Steps:

  • Mapping reads from multiple metagenomes to a set of references
  • Filtering mapped reads and generate coverage tables

Uploading data

Before processing any data, users will need to upload their data to CyVerse’s data store. The data store is built on iRODS, an open source data management system. Data can be uploaded directly through the Discovery Environment’s (DE) upload menu (this is limited to 2 GB per file) or through one of iRODS clients (click here for a list of available offerings). The easiest way to upload files securely and quickly is by using Cyberduck. Here we’ll assume you’ve installed Cyberduck and are connecting to the Data Store (a complete guide is available here):

url: irods://


port: 1247

user: your CyVerse username

password: your CyVerse password

Once you’ve logged in, you should be at your home folder. Drag n’ drop your read files to your home directory.