Microbial Ecology is now a Data Intensive Science: The cost of sequencing has decreased more than a million-fold in the last several years, causing a rapid influx of molecular data associated with microbes in diverse environments across both space and time. These datasets are available through community metagenomic data repositories, yet integrating and analyzing these data together with new data is cumbersome and often requires data duplication. Moreover, new software tools developed in individual labs are available through disparate code repositories, if at all. These data management and code distribution practices lead to small-scale research efforts that lack ability to explore large-scale datasets now available. Further, given the fast-paced development of new strategies for analyzing metagenomic datasets and advances in big data science for computing, cutting-edge tools for microbial research remain in a subset of highly proficient labs. To meet these needs, development of tools and distribution of important datasets in a common cyberinfrastructure is fundamental.
Building a Cyberinfrastructure for Viral Ecology: The distribution of new tools and datasets for viral ecology in a common cyberinfrastructure is essential in advancing our knowledge of viruses given limited resources. Specifically limitations exist in that: (i) viral metagenomes (viromes) are difficult to produce given to low quantities of DNA and specialized techniques, (ii) the vast majority of viral proteins are unknown (usually >90%), and (iii) new tools for comparative and functional metagenomics are rapidly developing. To meet these needs: viral datasets need to be shared in a common cyberinfrastructure that allows for comparative metagenomic analyses across diverse environments to identify new genes and function. Moreover, new tools need to be captured in a cyberinfrastructure where they can be continually developed and adapted by the community towards fast-paced innovation. To this end, we are developing iVirus.