In this episode we identify areas of “Peak-bioinformatics”. There are a lot of existing bioinformatics software out there - more often than not the new tool you want to write already exists or a new tool cannot effectively improve. We discuss this in terms of genome assembly, read mapping and phylogenetics.
Question and comments? microbinfie@gmail.com
SHOW NOTES
Generally, novel software is not needed if: There are a plethora of existing tools The problem is more or less solved or its been shown to be unsolvable The underlying technology or problem is now obsolete and/or superceded by other methods.
Multiple sequence aligners:
- MAFFT https://mafft.cbrc.jp/alignment/software/
- MUSCLE https://www.drive5.com/muscle/
- Whole genome aligners:
- Mauve http://darlinglab.org/mauve/mauve.html
- Mugsy http://mugsy.sourceforge.net/
- Sibellia http://bioinf.spbau.ru/sibelia
- Parsnp https://github.com/marbl/parsnp
Assemblers:
- SPADES https://github.com/ablab/spades
- Skesa https://github.com/ncbi/SKESA
- Velvet https://www.ebi.ac.uk/~zerbino/velvet/
- Abyss https://github.com/bcgsc/abyss
- Edena http://www.genomic.ch/edena.php
- Ray http://denovoassembler.sourceforge.net/
Long read assemblies
- HGAP https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-2.0
- Flye https://github.com/fenderglass/Flye
- Canu https://github.com/marbl/canu
- Ra https://www.biorxiv.org/content/10.1101/656306v1
- Unicycler https://github.com/rrwick/Unicycler
Read mapping
- BWA http://bio-bwa.sourceforge.net/
- Bowtie2 http://bio-bwa.sourceforge.net/
- Minimap2 https://github.com/lh3/minimap2 (BWA better for short reads: https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa)
- BBtools https://jgi.doe.gov/data-and-tools/bbtools/
- BLAST/BLAT: https://genome.ucsc.edu/FAQ/FAQblat.html
- SMALT https://www.sanger.ac.uk/science/tools/smalt-0
- Snippy https://github.com/tseemann/snippy
- Phenix: https://github.com/phe-bioinformatics/PHEnix
Variant callers
- GATK: https://software.broadinstitute.org/gatk/
- VIPR: https://www.viprbrc.org/brc/home.spg?decorator=vipr
- Varscan2 http://varscan.sourceforge.net/
Workflow managers
- Bespoke example https://github.com/VertebrateResequencing/vr-codebase
- Snakemake. https://snakemake.readthedocs.io/en/stable/
- Nextflow https://www.nextflow.io/
- Galaxy https://usegalaxy.org/
- Bpipe https://github.com/ssadedin/bpipe
Phylogenetics:
- Raxml - Raxml-NG https://cme.h-its.org/exelixis/software.html
- IQTREE http://www.iqtree.org/
- FastTree http://www.microbesonline.org/fasttree/
- BEAST 1&2 https://www.beast2.org/
- RevBayes https://revbayes.github.io/
Metagenomics
- Taxonomic classification: Megan, Kraken, SIGMA, MIDAS, metaphlan2, mOTUs.
- Assemblers: MetaSpades, metaflye, MEGAHIT, MetaVelvet, a lot of single isolate assemblers have been tweaked to run on metagenomes. https://github.com/lskatz/Kalamari
AMR
- https://github.com/arpcard/amr_curation
- https://food-safety-bioinformatics-hackathon.github.io/AMR-protocols/
- ABRICATE https://github.com/tseemann/abricate
- ARIBA https://www.sanger.ac.uk/science/tools/ariba
- Too many detection tools: https://docs.google.com/spreadsheets/d/18XGWpDiaE249qQKDAL7gdBCka0Z1drpA_s3FElfMJe0/edit#gid=0
Other mentioned resources
- Mentioned Recent review. Zhang et al. (2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6(3): e17915. https://doi.org/10.1371/journal.pone.0017915
- The Assemblerthon: https://assemblathon.org/
- Blog post describing that BWA better for short reads: https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa
- The science web: https://thescienceweb.wordpress.com/2015/03/23/each-bioinformatician-to-have-their-own-personal-short-read-aligner-by-2016/