BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

VIBRANT (Virus Identification By iteRative ANoTation) is an automated software tool for the recovery and annotation of bacterial/archaeal viruses, determination of genome quality and completeness, and metabolic gene identification. Highlighting viral auxiliary metabolic genes (AMGs) and metabolic pathways further allows the software to serve as a platform for evaluating viral community function. VIBRANT’s method utilizes a hybrid neural network machine learning and protein similarity approach (KEGG, Pfam and VOG protein databases) to maximize identification of lytic viral genomes and integrated proviruses, including highly diverse viruses. It achieves high accuracy and recovery due to the use of a newly described “v-score” metric to quantify virus-association of each protein annotation. VIBRANT was designed for use with complex metagenomic samples but also functions to identify viruses from cultivated or simple systems. 

 

VIBRANT was designed to be fast, accurate and user-friendly. At minimum the only input required is a single file containing unknown sequences (genomes, MAGs, scaffolds). The outputs include a variety of useful sets of information in addition to the identified viruses in FASTA and GenBank formats. These additional outputs include the following: simple visualizations of AMG pathways and viral metrics (number, quality and sizes), spreadsheet files for AMG details (names, counts and pathways), protein annotation information (full annotations per database and best hit annotation per protein), identified circular viruses, and a summary spreadsheet of all identification metric information per virus. 

Check out VIBRANT on GitHub

Read our manuscript at Microbiome

METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes) is a scalable software to study microbial metabolic traits and biogeochemical functional profiles of a microbiome/community based on microbial genomes. METABOLIC can help integrate genome-informed metabolism into metabolic and biogeochemical models.

METABOLIC annotates genomes and organizes metabolic characterization at the scale of individual genomes and the entire microbial community. Additional analyses can be conducted to study genome abundance, sequential metabolic transformations, metabolic energy flow patterns, and metabolic interactions and networks at community scales. User-friendly results are provided in the form of curated tables and diagrams. Finally, METABOLIC can enable visualization of microbial contributions to biogeochemical cycles.

Check out METABOLIC on GitHub

Read our manuscript at Microbiome

PropagAtE (Prophage Activity Estimator) uses genomic coordinates of integrated prophage sequences and short sequencing reads to estimate if a given prophage was in the lysogenic (dormant) or lytic (active) stage of infection. Providing context to the infection stage of a prophage is imperative for accurate conclusions on its role in effecting its host and the microbial community.  Prophages are designated according to a genomic/scaffold coordinate file, either manually generated by the user or taken directly from a VIBRANT (at least v1.2.1) output. After read coverage processing (trimming scaffold ends, filtering aligned gaps/mismatches, remove outlier coverage values) the prophage:host read coverage ratio and corresponding effect size are used to estimate if the prophage was actively replicating its genome (significantly more prophage genome copies than host copies). PropagAtE is customizable to take in complete genomes or metagenomic scaffolds, along with raw Illumina (short) reads or instead take pre-aligned data files (SAM or BAM format). Threshold values are customizable but PropagAtE outputs clear “active” versus “dormant” estimations of given prophages with associated statistics.

Check out PropagAtE on GitHub

Read our manuscript at mSystems

vRhyme is a machine-learning based binning algorithm and software designed for viromes. vRhyme is fast and precise for construction of viral metagenome-assembled genomes (vMAGs). vRhyme incorporates supervised machine learning based classification of sequence feature composition as well as read coverage effect size comparisons to optimize the binning of vMAGs, which have properties distinct from microbial MAGs. vRhyme was tested on multiple artificial metagenomes as well as several unique systems (e.g., megaphage, NCLDV, eukaryotic virus, crAssphage, and prophage genomes) to display that it is precise with high genome recovery. Implementation of multithreading allows the program to run fast and efficiently, even with >100,000 viral scaffolds and >50 samples.

Check out PropagAtE on GitHub

Read our manuscript at Nucleic Acids Research

ViWrap is an integrated, user-friendly pipeline to identify, bin, classify, and predict virus-host relationships for viruses from metagenomes. It can intake both metagenomic assemblies (or viromes) and reads as inputs. ViWrap has the following advanced functionalities: 1) a comprehensive screening for viruses while still keeping stringent rules; 2) a standardized and reproducible pipeline that integrates advanced tools/databases and is easy to amend for additional functionalities; 3) flexibility for options of identifying methods, metagenomic reads, and custom microbial genomes for various application scenarios; 4) a one-stop workflow to generate easy-to-read/parse results and visualization of statistical summary of viruses.

Check out PropagAtE on GitHub

Read our manuscript at iMeta