Cap3 is a fragment assembly program written by Xiaoqiu Huang <xqhuang@cs.iastate.edu>.
The quick-and-dirty on how to use it is:
(1) login to the Unix computer on which it is installed.
(2) combine all of your sequence fragments into a single fasta file
(3) type:
cap3 frag.file > cap3.log
You will then have as output, a file of contigs (ending in .contigs) and a file of unused fragments (ending in .singlets). The file cap3.log tells you about the details of how and why the contigs were selected.
The following is taken directly from the cap3 README file written by the author.
A detailed documentation on CAP3 usage.
Usage: cap3 File_of_reads
[options]
File_of_reads is a file of DNA reads in FASTA format
If
the file of reads is named 'xyz', then
the file of quality values must be
named 'xyz.qual',
and the file of constraints named 'xyz.con'.
Options
(default values):
-a N specify band expansion size N > 10 (20)
-b N
specify base quality cutoff for differences N > 15 (20)
-c N specify base
quality cutoff for clipping N > 5 (10)
-d N specify max qscore sum at
differences N > 100 (250)
-e N specify extra number of differences N >
10 (20)
-g N specify gap penalty factor N > 0 (6)
-m N specify match
score factor N > 0 (2)
-n N specify mismatch score factor N < 0
(-5)
-o N specify overlap length cutoff > 20 (30)
-p N specify overlap
percent identity cutoff N > 65 (75)
-s N specify overlap similarity score
cutoff N > 100 (500)
-u N specify min number of constraints for correction
N > 0 (4)
-v N specify min number of constraints for linking N > 0
(2)
-x N specify prefix string for output file names (cap)
If no
quality file is given, then a default quality value of 10 is used for each
base.
CAP3 takes as input a file of sequence reads in FASTA format. If the names of
reads contain a dot ('.'), CAP3 requires that the names of reads sequenced from
the same subclone contain the same substring up to the first dot. CAP3 takes two
optional files: a file of quality values in FASTA format and a file of
forward-reverse constraints.
The file of quality values must be named
"xyz.qual", and the file of forward-reverse constraints must be named "xyz.con",
where "xyz" is the name of the sequence file. CAP3 uses the same format of a
quality file as Phrap.
Each line of the constraint file specifies one
forward-reverse constraint of the form:
ReadA ReadB MinDistance
MaxDistance
where ReadA and ReadB are names of two reads, and MinDistance
and MaxDistance are distances (integers) in base pairs. The constraint is
satisfied if ReadA in forward orientation occurs in a contig before ReadB in
reverse orientation, or ReadB in forward orientation occurs in a contig before
ReadA in reverse orientation, and their distance is between MinDistance and
MaxDistance. CAP3 works better if a lot more constraints are used.
We
have a separate program named "formcon" to generate a constraint file from the
sequence file. The program takes an input file of fragments in FASTA format and
two integers (minimum distance and maximum distance in bp). The minimum distance
and maximum distances specify a lower and a upper limit on the subclone length,
respectively. It produces a file of forward-reverse constraints for CAP3. It is
assumed that a pair of forward and reverse reads must contain a dot in their
names and a pair of forward and reverse reads have a common name up to the first
dot. Because CAP3 uses reads whose ends are clipped, instead of raw reads, to
measure their distance, the distance seen by CAP3 could be different from the
insert size by 1000 to 1500 bp. For example, if the insert size is 2000 to 3000
bp, we recommend that you use 500 for the minimum distance and 4000 for the
maximum distance. The results are in the file with name ending in ".con".
The complete help files for cap3, with more details, are located on genome.chmcc.org in the file /usr/local/gcg/doc/cap3.help