Manuals: Web & text versions |
Web Manual Table of Contents |
Tutorial Practice Datasets |
Formating data with "prepare" |
Testing & Extending Maps |
Bibliography & Other Links |
The build option evaluates the likelihood of possible maps (also called
"locus orders") using the most informative data first. Loci are added
in their order of informativeness, i.e., in their ranking of number of
informative meioses. Further, the addition of loci is a two step process.
In the first step, the more efficient data from phase known meioses are used to
evaluate possible maps; in the second, phase unknown information is used.
Ideally, build stops when it has found the best of all possible maps
incorporating all the loci. Practically, all possible maps cannot be
evaluated; the memory requirements can be too great. For this reason,
as build moves on to add the next locus, it discards those maps with
likelihoods much lower than the current "best" map, and retains only
at most a limited number of possible maps. These practical limitations can lead
to incomplete or even incorrect maps; the map discarded part way through a
build run due to lack of support from the loci already processed may
be the correct one, when all loci are considered. Fortuantely, these parametres
- how much lower in likelihood and maximum possible map number - may
be changed. The best mapping strategy is to run build several times,
BOTH altering these parametres until the programme takes too
long to run AND changing the order in which the loci are
processed. These two parametres (and others), as well as the order of locus
processing, are defined in the chr#.par file.
A typical build run, adding loci D and E to an existing map of three loci, "A B C", might proceed like this:
To illustrate refined use of the mapping and map testing features of CRI-MAP, we need a more complex data set; please retreive chr2gen.zip to your Crimaptutl directory, "unzip" it, and run the prepare option on it for a subsequent build run. Respond to the prepare option prompts as before. (Don't know how to "unzip" a file? Check the "zip" command on the UNIX Commands Short List page.)
chr2.gen is a large file (~292 K), holding data for 65 families scored for up to 78 co-dominant (e.g., RFLP) loci*. The prepare option has determined the informativeness of these loci (chr2.loc) and ranked them as well (chr2.par). The two most informative (68 (D2S44) & 7 (CEB1/HINF)) show as the ordered loci, and the other 76 show as the loci to be inserted, least informative (8 (CEB11/HINF), 46 (D2S65), & 60 (D2S12)) last.
To save both our time and our eyes, we will build maps using only a subset of these 78 loci! In the following build exercise, there are four steps:
Edit the file
chr2.par so the ordered_loci field specifies the index numbers of two informative loci (or three, in their correct order!), with some of the other p arm loci as inserted_loci. Save this file twice: as chr2.par and backed-up as chr201.par . Also back-up chr2.ord as chr2.ord.orig . (If you want to use sets of ordered and inserted loci that differ from the tutorial choices, please create extra chr2try#.par files matching them.) Enter the command crimap 2 build > build201.out at the prompt. When finished, the prompt will return, and you may browse the programmes' output with the command more build201.out . (To build maps from other sets of ordered and inserted loci, first rename the chr2try#.par file to chr2.par. Next back-up chr2.ord to chr201.ord , and restore the original chr2.ord from chr2.ord.orig . Finally, re-issue the crimap 2 build > build2##.out command, naming the output file for the try#.)A quick glance through build201.out shows the structure of the programmes' output. After a re-statement of much of the chr2.par file, the loci used to build the map are listed and described as ordered or inserted. Then a series of interim reports reveals the status of the currently ordered loci ("current orders") plus which other locus the programme is presently trying to insert into those current orders (the extra locus shown in "orders_temp"). Finally, and unfortunately for this build run, the programme exits with no loci added to the "current orders", and prints the possible maps. None of the loci to be inserted were unambiguously placed on the map of the two original ordered loci; we see the Sex_averaged map having only the two original loci, and mapping them at 100 centi-Morgans apart. This is followed by the most likely placements for each of the inserted loci.
The three other trial sets of ordered and inserted loci also fail to produce maps of the p arm. How, then, does one choose the ordered and inserted loci? As I mentioned, these naïve choices assumed the dataset has mapping information for each locus relative to all others. (They also assumed we would be lucky enough to choose loci less than 100 cM apart!) To find which loci ARE mappable with which other loci, we must use CRI-MAPs twopoint option. At the same time, and to maximise the available information, we will use "locus haplotyping" for loci with multiple polymorphisms.
Do you wish to change any of these values? (y/n)
enter "y", if
use_haps = 0 and set it to 1 ! Then when askedDo you wish to enter any new haplotyped systems? (y/n)
enter "y". There are nine genes each scored two or more times, for two or more independent polymorphisms (unique probe-enzyme combinations). For each of these genes, enter its polymorphisms in a haplotyped system, having "inter-locus" distances of zero.
hap_sys0 0 2 4 * hap_sys0 1 3 * hap_sys0 23 24 * hap_sys0 39 40 * hap_sys0 41 42 * hap_sys0 44 45 * hap_sys0 53 54 * hap_sys0 63 64 * hap_sys0 72 73 75 76 * doneChoose the twopoint option, respond with "n" when asked if you want LOD tables for ALL pairs of loci, and enter the index numbers of the p arm loci as ordered loci.
1 5 9 33 39 41 44 50 53 55 56 59 60 62 63 65 66 67 68 69 70 71 72 *Enter no inserted loci. This will force the calculation of pairwise lodscores for all pairs of loci on the p arm. After accepting the new parametre file, type crimap 2 twopoint > 2p2ptlod.out. Browse the 2p2ptlod.out file. After a re-statement of the parametres, and a listing of the haplotyped systems, this file lists pairs of loci that are genetically linked, plus their likely recombination fractions and the confidence placed in these estimates (lods). While any pair of loci in this list could be chosen as ordered_loci for the next build run, choose a pair with both a large inter-locus distance (rec. fracs.) and confidence (lods). By these criteria, good choices are locus pairs D2S43 - D2S44 or D2S6 - D2S48.
Create another copy of
chr2.par and place the loci of the biggest map - in their reported order - in the ordered_loci field, leaving the inserted_loci blank. Finally, put this map through a local rearrangement analysis with the flips4 option. Re-direct the programme output to flip42##.out.
Notice that the map from Try 11 is contained by the map from
Try 13. Notice further that both Try 12 and
Try 13 manage to insert locus D2S47, yet have no
other loci in common. Although all build runs had access to the
same mapping data, and were attempting to map the same set of 23 p
arm loci, strikingly different maps were produced. The "take-home
message" is try MANY build runs!
A difference in the order in which loci were added to the current_orders
gave the difference in map resolving power between Try 11 and
Try 13 build runs. But why did Try 13
fail to include the ordered loci of Try 12? This seems
especially odd because D2S47 - the common locus - was added to the
Try 13 map BEFORE any of
ACP1 D2S1 & TPO were tried. One problem could be that we have
specified the least informative locus for two of the three haplotyped sets of
loci in this critical region. In the
programme authors' language,
we are using weak loci as "primary loci", which may use the extant
data less efficiently. Using the most informative haplotyped loci as
"primaries" yields chr214.par with the following changes:
... ordered_loci 56 54 50 * inserted_loci 68 69 65 44 64 5 67 1 66 33 59 62 70 75 55 9 42 40 71 60 * hap_sys0 75 73 72 76 * hap_sys0 64 63 * hap_sys0 54 53 * hap_sys0 44 45 * hap_sys0 42 41 * hap_sys0 40 39 * hap_sys0 24 23 * hap_sys0 1 3 * hap_sys0 0 2 4 * ENDHowever, these changes don't alter the resultant map. build214.out shows the most likely placements of ACP1, D2S1_2 & TPO on its map, followed by their LOD scores.
ACP1 55 68 1 69 33 59 40 66 56 54 50 65 X X -627.11 -626.89 D2S1_2 55 68 1 69 33 59 40 66 56 54 50 65 X X -614.28 -614.06 TPO 55 68 1 69 33 59 40 66 56 54 50 65 X X -645.62 -645.82Accurate placement of these loci might require waiting for this end of the map to be "filled in" with at least one other locus showing linkage to them. Using the hap_sys0 revisions of chr214.par , flip4215.out reveals only one local rearrangement - a switch of two neighbouring loci - with a likelihood close to the best current map. With a LOD score 1.46 worse, this rearrangement is ~28.8 fold LESS likely than the best map. All other rearrangements have LOD scores 3.000 or more worse, and are filtered out of the flip4215.out report. This gives us high confidence in most of the map, and an additional reason to focus on adding new loci to its right end; not only might we place ACP1, TPO, & D2S1, but also we could settle on the true order for "D2S48 (APOB+APOB_2) D2S70 D2S47" . So, what new loci might be insertable, and what is their best sequence in the inserted_loci parametre for efficient and effective use of the next build run?
There are three sources of new loci: 1.) those known to be on the p arm but not yet located on the map (i.e., 11 loci already tried), 2.) those possibly on the p arm (i.e., 34 untried loci having no cytological location data), & 3.) those known to be on the q arm (i.e., 14 untried loci). While several of these new loci will show no linkage to the current map, when extending a map, it is acceptable to reduce the stringency used to reject other locus orders and reports of recombination fraction. We have used a stringency of LOD >=3.000 (the parametres PK_LIKE_TOL & PUK_LIKE_TOL); reducing these to 2.000 could help "filling-in" the extant map in another build run. However, to run build with a set of 12 ordered_loci against a set of 11 + 34 + 14 inserted_loci would take a VERY long time, and the result could be disappointing or brilliant, depending on the order in which the loci were tried. A better idea - to screen out loci distant from the current map and to omit loci with no information - is to first try the twopoint option again, looking for significant linkage between loci on the current map and loci we hope to insert. Then, having chosen the subset of loci to be inserted, determine the best sequence in which to insert them, using the information stored in chr214.ord and the instant option.
Of the several new loci screened for linkage to those in the current map of the p arm, 32 show a significant score at LOD 2.0 or better. Oddly, since we are working with the p arm, two loci cytologically placed on p and with high informativeness have been excluded from these 32: D2S61_2 (64) & CPSI (9). Perhaps more oddly, four of these 32 loci have been cytologically located to the q arm: LCT (49), GYPC (58), D2S17 (35), & D2S16 (34). If one or more of these can be inserted in the current map, we could have a starting point for extending our map across all of chromosome 2. (Don't worry - not part of the tutorial!)
Now for the second round of build runs.
(NB: These
build runs are LONG, requiring lots of memory, and the build*.out files are LARGE; be sure you want to wait for them to come!).Run
flips2 on the biggest map.
These final attempts at map building yield significantly different results.
Try 218 produces a map of 26 loci,
Try 219 builds a longer map of only 24 loci, losing six loci
from the 218 map but adding four different ones, and
Try 220 links 27 loci, combining pieces of the two previous
maps & showing two locus pair reversals. The details of the differences are
summarised in the tables below,
but the important conclusion for CRI-MAP users is to process
the data in as many ways as possible.
Try 219 represents the standard approach, with
inserted_loci added in the sequence of their informativeness.
Try 218 was an attempt to insert the problematic p
arm loci (those still unmapped after Try 214) only after first
inserting any other new loci. The attempt failed, perhaps because the cumulative
multipoint linkage evidence was conflicting. However,
inserting loci in a sequence that follows more closely their likely places
in the current map (Try 218 & Try 220)
introduces changes to the default map. The new maps both have more loci, and
one is slightly longer.
Finally, note that the simple trick of reversing locus order in both
inserted_loci and current_loci (Try 220)
does lead to insertion of the problematic p arm loci, and
also gives the "best" map of the three build runs.
build ID | # loci mapped | ID of loci added | map length (cM) | LOD score | comments |
---|---|---|---|---|---|
Try 218 | 26 | 22 21 13 77 58 49 20 10 24 26 28 31 17 70 | 220.8 | -929.87 | LOD=2.000; inserted by "most likely" loc'n & by inf. rank |
Try 219 | 24 | 22 61 49 20 10 24 26 28 31 44 42 5 | 221.7 | -902.96 | LOD=2.000; inserted by inf. rank |
Try 220 | 27 | 21 61 77 49 20 10 24 26 28 31 17 70 44 42 5 | 232.9 | -1033.71 | LOD=2.000; inserted by "most likely" loc'n (Try 218 sequence reversed) |
55 68 1 69 33 59 40 66 56 54 50 65 |
22 21 55 13 68 1 77 58 49 20 69 10 33 24 59 26 28 40 31 66 17 70 56 54 50 65 |
22 55 61 68 1 49 20 69 10 33 24 59 26 28 40 31 66 56 54 50 44 65 42 5 |
21 55 61 68 77 1 49 20 69 10 33 24 59 26 28 40 31 66 17 56 70 54 50 44 65 42 5 |
* I use the term a locus here to mean a polymorphism at a locus, detectable via a unique restriction endonuclease plus probe combination. Thus, a gene with two (or more) polymorphisms detected by different methods will count as two (or more) loci.