Next-gen-seq software available in the commercial and public domain
Source: http://seqanswers.com/forums/showthread.php?t=43
Integrated solutions
- CLCbio
Genomics Workbench - de novo and reference assembly of
Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial
next-gen-seq software that extends the CLCbio Main Workbench software.
Includes SNP detection, CHiP-seq, browser and other features. Commercial.
Windows, Mac OS X and Linux.
- Galaxy - Galaxy
= interactive and reproducible genomics. A job webportal.
- Genomatix - Integrated Solutions for Next Generation
Sequencing data analysis.
- JMP
Genomics - Next gen visualization and statistics tool from SAS. They
are working with NCGR to refine this tool and produce
others.
- NextGENe - de novo and reference assembly of
Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly
Tool approach where reads are joined via "anchors" into
mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser
and other features. Commercial. Win or MacOS.
- SeqMan
Genome Analyser - Software for Next Generation sequence assembly of
Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence
Analysis software for additional analysis and visualization capabilities.
Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS
X.
- SHORE - SHORE, for Short Read, is a mapping and
analysis pipeline for short DNA sequences produced on a Illumina Genome
Analyzer. A suite created by the 1001 Genomes project. Source for
POSIX.
- SlimSearch
- Fledgling commercial product.
Align/Assemble to a reference
- ABySS - Assembly By Short Sequences. ABySS is a de
novo sequence assembler that is designed for very short reads. The
single-processor version is useful for assembling genomes up to 40-50
Mbases in size. The parallel version is implemented using MPI and is
capable of assembling larger genomes. By Simpson JT and others at the
Canada's Michael Smith Genome Sciences Centre. C++ as source.
- BFAST - Blat-like Fast Accurate Search Tool. Written
by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
- Bowtie -
Ultrafast, memory-efficient short read aligner. It aligns short DNA
sequences (reads) to the human genome at a rate of 25 million reads per
hour on a typical workstation with 2 gigabytes of memory. Uses a
Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben
Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
- BWA - Heng
Lee's BWT Alignment program - a progression from Maq. BWA is a fast
light-weighted tool that aligns short sequences to a sequence database,
such as the human reference genome. By default, BWA finds an alignment
within edit distance 2 to the query sequence. C++ source.
- ELAND - Efficient Large-Scale Alignment of Nucleotide
Databases. Whole genome alignments to a reference genome. Written by
Illumina author Anthony J. Cox for the Solexa 1G machine.
- Exonerate - Various forms of pairwise alignment
(including Smith-Waterman-Gotoh) of DNA/protein against a reference.
Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
- GenomeMapper - GenomeMapper is a short read mapping
tool designed for accurate read alignments. It quickly aligns millions of
reads either with ungapped or gapped alignments. A tool created by the
1001 Genomes project. Source for POSIX.
- GMAP - GMAP
(Genomic Mapping and Alignment Program) for mRNA and EST Sequences.
Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for
Unix.
- gnumap - The
Genomic Next-generation Universal MAPper (gnumap) is a program designed
to accurately map sequence data obtained from next-generation sequencing
machines (specifically that of Solexa/Illumina) back to a genome of any
size. It seeks to align reads from nonunique repeats using statistics.
From authors at Brigham Young University. C source/Unix.
- MAQ -
Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly
designed for Illumina with preliminary functions to handle ABI SOLiD
data. Written by Heng Li from the Sanger Centre. Features extensive
supporting tools for DIP/SNP detection, etc. C++ source
- MOSAIK - MOSAIK produces gapped alignments using the
Smith-Waterman algorithm. Features a number of support tools. Support for
Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strörg at
Boston College. Win/Linux/MacOSX
- MrFAST and
MrsFAST - mrFAST & mrsFAST are designed to map short reads
generated with the Illumina platform to reference genome assemblies; in a
fast and memory-efficient manner. Robust to INDELs and MrsFAST has a
bisulphite mode. Authors are from the University of Washington. C as
source.
- MUMmer -
MUMmer is a modular system for the rapid whole genome alignment of
finished or draft sequence. Released as a package providing an efficient
suffix tree library, seed-and-extend alignment, SNP detection, repeat
detection, and visualization tools. Version 3.0 was developed by Stefan
Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway,
Corina Antonescu and Steven L Salzberg - most of whom are at The
Institute for Genomic Research in Maryland, USA. POSIX OS required.
- Novocraft - Tools for reference alignment of
paired-end and single-end Illumina reads. Uses a Needleman-Wunsch
algorithm. Can support Bis-Seq. Commercial. Available free for
evaluation, educational use and for use on open not-for-profit projects.
Requires Linux or Mac OS X.
- PASS - It supports Illumina, SOLiD and Roche-FLX data
formats and allows the user to modulate very finely the sensitivity of
the alignments. Spaced seed intial filter, then NW dynamic algorithm to a
SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
- RMAP -
Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By
Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC
Bioinformatics). POSIX OS required.
- SeqMap - Supports up to 5 or more bp
mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab
at Stanford. Builds available for most OS's.
- SHRiMP - Assembles to a reference sequence. Developed
with Applied Biosystem's colourspace genomic representation in mind.
Authors are Michael Brudno and Stephen Rumble at the University of
Toronto. POSIX.
- Slider- An application for the Illumina
Sequence Analyzer output that uses the probability files instead of the
sequence files as an input for alignment to a reference sequence or a set
of reference sequences. Authors are from BCGSC. Paper is here.
- SOAP - SOAP
(Short Oligonucleotide Alignment Program). A program for efficient gapped
and ungapped alignment of short oligonucleotides onto reference
sequences. The updated version uses a BWT. Can call SNPs and INDELs.
Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
- SSAHA - SSAHA (Sequence Search and Alignment by
Hashing Algorithm) is a tool for rapidly finding near exact matches in
DNA or protein databases using a hash table. Developed at the Sanger
Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for
Linux/Alpha.
- SOCS -
Aligns SOLiD data. SOCS is built on an iterative variation of the
Rabin-Karp string search algorithm, which uses hashing to reduce the set
of possible matches, drastically increasing search speed. Authors are
Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
- SWIFT - The SWIFT suit is a software collection for
fast index-based sequence comparison. It contains: SWIFT — fast
local alignment search, guaranteeing to find epsilon-matches between two
sequences. SWIFT BALSAM — a very fast program to find semiglobal
non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen
(SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
- SXOligoSearch - SXOligoSearch is a commercial
platform offered by the Malaysian based Synamatix. Will
align Illumina reads against a range of Refseq RNA or NCBI genome builds
for a number of organisms. Web Portal. OS independent.
- Vmatch - A versatile
software tool for efficiently solving large scale sequence matching
tasks. Vmatch subsumes the software tool REPuter, but is much more
general, with a very flexible user interface, and improved space and time
requirements. Essentially a large string matching toolbox. POSIX.
- Zoom - ZOOM (Zillions Of Oligos Mapped) is designed
to map millions of short reads, emerged by next-generation sequencing
technology, back to the reference genomes, and carry out post-analysis.
ZOOM is developed to be highly accurate, flexible, and user-friendly with
speed being a critical priority. Commercial. Supports Illumina and SOLiD
data.
De novo Align/Assemble
- ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS
is a whole genome shotgun assembler that can generate high quality
assemblies from short reads. Assemblies are presented in a graph form
that retains ambiguities, such as those arising from polymorphism,
thereby providing information that has been absent from previous genome
assemblies. Broad Institute.
- Edena -
Edena (Exact DE Novo Assembler) is an assembler dedicated to process the
millions of very short reads produced by the Illumina Genome Analyzer.
Edena is based on the traditional overlap layout paradigm. By D.
Hernandez, P. Françs, L. Farinelli, M. Osteras, and J. Schrenzel.
Linux/Win.
- EULER-SR - Short read de novo assembly. By
Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome
Research). Uses a de Bruijn graph approach.
- MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is
able to perform true hybrid de-novo assemblies using reads gathered
through 454 sequencing technology (GS20 or GS FLX). Compatible with 454,
Solexa and Sanger data. Linux OS required.
- SEQAN - A Consistency-based Consensus Algorithm for
De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias
Rausch and others. C++, Linux/Win.
- SHARCGS - De
novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T
and Himmelbauer H. from the Max-Planck-Institute for Molecular
Genetics.
- SSAKE - The Short Sequence Assembly by K-mer search
and 3' read Extension (SSAKE) is a genomics application for aggressively
assembling millions of short nucleotide sequences by progressively
searching for perfect 3'-most k-mers using a DNA prefix tree. Authors are
Renéarren, Granger Sutton, Steven Jones and Robert Holt from the Canada's
Michael Smith Genome Sciences Centre. Perl/Linux.
- SOAPdenovo -
Part of the SOAP suite. See above.
- VCAKE - De novo assembly of short reads with robust
error correction. An improvement on early versions of SSAKE.
- Velvet - Velvet is a de novo genomic assembler
specially designed for short read sequencing technologies, such as Solexa
or 454. Need about 20-25X coverage and paired reads. Developed by Daniel
Zerbino and Ewan Birney at the European Bioinformatics Institute
(EMBL-EBI).
SNP/Indel Discovery
- ssahaSNP - ssahaSNP is a polymorphism detection tool.
It detects homozygous SNPs and indels by aligning shotgun reads to the
finished genome sequence. Highly repetitive elements are filtered out by
ignoring those kmer words with high occurrence numbers. More tuned for
ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the
Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
- PolyBayesShort - A re-incarnation of the PolyBayes
SNP discovery tool developed by Gabor Marth at Washington University.
This version is specifically optimized for the analysis of large numbers
(millions) of high-throughput next-generation sequencer reads, aligned to
whole chromosomes of model organism or mammalian genomes. Developers at
Boston College. Linux-64 and Linux-32.
- PyroBayes - PyroBayes is a novel base caller for
pyrosequences from the 454 Life Sciences sequencing machines. It was
designed to assign more accurate base quality estimates to the 454
pyrosequences. Developers at Boston College.
Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
- EagleView - An information-rich genome assembler
viewer. EagleView can display a dozen different types of information
including base quality and flowgram signal. Developers at Boston
College.
- LookSeq - LookSeq is a web-based application for
alignment visualization, browsing and analysis of genome sequence data.
LookSeq supports multiple sequencing technologies, alignment sources, and
viewing modes; low or high-depth read pileups; and easy visualization of
putative single nucleotide and structural variation. From the Sanger
Centre.
- MapView - MapView: visualization of short reads
alignment on desktop computer. From the Evolutionary Genomics Lab at
Sun-Yat Sen University, China. Linux.
- SAM - Sequence Assembly Manager. Whole Genome
Assembly (WGA) Management and Visualization Tool. It provides a generic
platform for manipulating, analyzing and viewing WGA data, regardless of
input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui
and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL
backend and Perl-CGI web-based frontend/Linux.
- STADEN -
Includes GAP4. GAP5 once completed will handle next-gen sequencing data.
A partially implemented test version is available here
- XMatchView - A visual tool for analyzing cross_match
alignments. Developed by Rene Warren and Steven Jones at Canada's Michael
Smith Genome Sciences Centre. Python/Win or Linux.
Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
- BS-Seq - The source code and data for the
"Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA
Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA).
POSIX.
- CHiPSeq
- Program used by Johnson et al. (2007) in their Science publication
- CNV-Seq - CNV-seq, a new method to detect copy number
variation using high-throughput sequencing. Chao Xie and Martti T Tammi
at the National University of Singapore. Perl/R.
- FindPeaks - perform analysis of ChIP-Seq experiments.
It uses a naive algorithm for identifying regions of high coverage, which
represent Chromatin Immunoprecipitation enrichment of sequence fragments,
indicating the location of a bound protein of interest. Original
algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson.
Current code and implementation by Anthony Fejes. Authors are from the
Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent.
Latest versions available as part of the Vancouver
Short Read Analysis Package
- MACS -
Model-based Analysis for ChIP-Seq. MACS empirically models the length of
the sequenced ChIP fragments, which tends to be shorter than sonication
or library construction size estimates, and uses it to improve the
spatial resolution of predicted binding sites. MACS also uses a dynamic
Poisson distribution to effectively capture local biases in the genome
sequence, allowing for more sensitive and robust prediction. Written by
Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.
- PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq
Experiments Relative to Controls. a two-pass approach for scoring
ChIP-Seq data relative to controls. The first pass identifies putative
binding sites and compensates for variation in the mappability of
sequences across the genome. The second pass filters out sites that are
not significantly enriched compared to the normalized input DNA and
computes a precise enrichment and significance. By Rozowsky J et al.
C/Perl.
- QuEST - Quantitative Enrichment of Sequence Tags.
Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding
sites based on ChIP-Seq data. (C++)
- SISSRs - Site Identification from Short Sequence
Reads. BED file input. Raja Jothi @ NIH. Perl.
See also this thread for ChIP-Seq, until I get time to update this list.
Alternate Base Calling
- Rolexa - R-based framework for base calling of Solexa
data. Project publication
- Alta-cyclic - "a novel Illumina Genome-Analyzer
(Solexa) base caller"
Transcriptomics
- ERANGE -
Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports
Bowtie, BLAT and ELAND. From the Wold lab.
- G-Mo.R-Se - G-Mo.R-Se is a method aimed at using
RNA-Seq short reads to build de novo gene models. First, candidate exons
are built directly from the positions of the reads mapped on the genome
(without any ab initio assembly of the reads), and all the possible
splice junctions between those exons are tested against unmapped reads.
From CNS in France.
- MapNext - MapNext: A software tool for spliced and
unspliced alignments and SNP detection of short sequence reads. From the
Evolutionary Genomics Lab at Sun-Yat Sen University, China.
- QPalma - Optimal Spliced Alignments of Short Sequence
Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian
Schneeberger, and Gunnar Räch. A paper is available.
- RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is
developed and maintained by Hui Jiang at Stanford University.
- TopHat - TopHat
is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq
reads to mammalian-sized genomes using the ultra high-throughput short
read aligner Bowtie, and then analyzes the mapping results to identify
splice junctions between exons. TopHat is a collaborative effort between
the University of Maryland and the University of California, Berkeley