Pedigree
Introduction
In an `animal model' or
`sire model' genetic analysis we have data on a
set of animals that are genetically linked via a pedigree. The
genetic effects are therefore correlated and, assuming normal
modes of inheritance, the correlation expected from additive
genetic effects can be derived from the pedigree
provided all the genetic links are in the
pedigree. The additive genetic relationship matrix (sometimes
called the numerator relationship matrix) can be calculated from
the pedigree. It is actually the inverse relationship
matrix that is formed by ASReml for analysis.
Users new to this subject might find notes by Julius van der Werf helpful:
Mixed_Models_for_Genetic_analysis.pdf.
For the more general situation where the pedigree based inverse
relationship matrix is not the appropriate/required matrix, the user can
provide a particular general inverse variance (
GIV
) matrix explicitly
in a
.giv
file.
In this chapter we consider data presented in Harvey (1977) using
the command file
harvey.as
Pedigree file example
animal !P
sire !A
dam
lines 2
damage
adailygain
harvey.ped !ALPHA
harvey.dat
adailygain ~ mu lines, !r animal 0.25
Pedigree factor type
In ASReml the
!P
data field qualifier indicates
that the corresponding data field has an associated pedigree. The file
containing the pedigree (
harvey.ped
in the example) for
animal
is specified after all field definitions and before
the datafile definition.
See below for the first 20
lines of
harvey.ped
together with the corresponding lines of the data file
harvey.dat.
All individuals appearing in the data file must
appear in the pedigree file.
When all the pedigree information (
individual, male_parent, female_parent
) appears as the first three fields of the data file, the data
file can double as the pedigree file. In this example the line
harvey.ped !ALPHA
could be replaced with
harvey.dat !ALPHA.
Typically additional individuals providing additional genetic links are present
in the pedigree file.
The pedigree file
The pedigree file is used to define the genetic
relationships for fitting a genetic animal model and is required if
the
!P
qualifier is associated with a data field.
The pedigree file
has three fields; the identities of an
individual, its sire and its dam (or maternal grand
sire if the
!MGS
qualifier, is specified),
in that order,
use identity
0
or
*
for unknown parents.
is sorted so that the line giving the
pedigree of an individual appears before any line where that
individual appears as a parent,
is read free format; it may be the same
file as the data file if the data file is free format and has the
necessary identities in the first three fields, see below,
is specified on the line immediately
preceding the data file line in the command file,
harvey.ped harvey.dat
101 SIRE_1 0 101 SIRE_1 0 1 3 192 390 2241
102 SIRE_1 0 102 SIRE_1 0 1 3 154 403 2651
103 SIRE_1 0 103 SIRE_1 0 1 4 185 432 2411
104 SIRE_1 0 104 SIRE_1 0 1 4 183 457 2251
105 SIRE_1 0 105 SIRE_1 0 1 5 186 483 2581
106 SIRE_1 0 106 SIRE_1 0 1 5 177 469 2671
107 SIRE_1 0 107 SIRE_1 0 1 5 177 428 2711
108 SIRE_1 0 108 SIRE_1 0 1 5 163 439 2471
109 SIRE_2 0 109 SIRE_2 0 1 4 188 439 2292
110 SIRE_2 0 110 SIRE_2 0 1 4 178 407 2262
111 SIRE_2 0 111 SIRE_2 0 1 5 198 498 1972
112 SIRE_2 0 112 SIRE_2 0 1 5 193 459 2142
113 SIRE_2 0 113 SIRE_2 0 1 5 186 459 2442
114 SIRE_2 0 114 SIRE_2 0 1 5 175 375 2522
115 SIRE_2 0 115 SIRE_2 0 1 5 171 382 1722
116 SIRE_2 0 116 SIRE_2 0 1 5 168 417 2752
117 SIRE_3 0 117 SIRE_3 0 1 3 154 389 2383
118 SIRE_3 0 118 SIRE_3 0 1 4 184 414 2463
119 SIRE_3 0 119 SIRE_3 0 1 5 174 483 2293
120 SIRE_3 0 120 SIRE_3 0 1 5 170 430 2303
Reading in the pedigree file
The syntax for specifying a pedigree file in the ASReml command file is
pedigree_file [qualifiers]
the
qualifiers
are listed below,
the identities (
individual, male_parent, female_parent
) are merged into a single list and the inverse relationship is formed before the data file is read,
when the data file is read, data
fields with the
!P
qualifier are recoded according to the
combined identity list,
the inverse
relationship matrix is automatically associated with factors coded
from the pedigree file unless some other covariance structure is
specified. The inverse relationship matrix is specified with the
variance model name
AINV
,
the
inverse relationship matrix is
written to
ainverse.bin,
if
ainverse.bin
already
exists ASReml assumes it was formed in a previous run and has the
correct inverse;
ainverse.bin
is read, rather than the inverse being reformed
(unless
!MAKE
is specified); this saves time when performing
repeated analyses based on a particular pedigree;
delete
ainverse.bin
or specify
!MAKE
if the pedigree is changed between runs,
identities are printed in the
.sln
file,
identities should be whole numbers less than 200,000,000 unless
!ALPHA
is specified,
pedigree lines for parents must precede their progeny,
unknown parents should be given the identity number 0,
if an individual appearing as a parent does not appear in the first column, it is assumed to have unknown parents, that is, parents with unknown parentage do not need their own line in the file,
identities may appear as both male and female
parents, for example, in forestry.
Pedigree file qualifiers
!ALPHA
indicates that the
identities are alphanumeric with up to 20 characters; otherwise by default they are numeric whole numbers <200,000,000.
!DIAG
causes the pedigree identifiers, the
diagonal elements of the Inverse of the Relationship Matrix
and the inbreeding coefficients
for the individuals (calculated as the diagonal of A-I)
to be written to
AINVERSE.DIA.
!GIV
instructs ASReml to write out the A-inverse in the format of
.giv
files.
!GROUPS g
includes genetic groups in the pedigree. The first g lines of the pedigree identify genetic groups (with zero in both the sire and dam fields). All other lines must specify one of the gen
!INBRED
generates pedigree for inbred lines.
Each cross is assumed to
be selfed several times to stabilize as an inbred line as is usual for
cereals, before being evaluated or crossed with another line.
Since inbreeding is usually associated with strong selection,
it is not obvious that a pedigree assumption of covariance of 0.5 between parent and offspring actually holds.
Do not use the
!INBRED
qualifier with the
!MGS
or
!SELF
qualifiers.
!MAKE
tells ASReml to make the
A-inverse
(rather than trying to retrieve it from the
ainverse.bin
file).
!MGS
indicates that the third identity is the sire of the dam rather than the dam.
!REPEAT
tells ASReml to ignore
repeat occurrences of lines in the pedigree file.
Use of this option will avoid the check that animals occur in chronological order, but chronological order is still required.
!SELF s
allows partial selfing when third field is unknown.
It indicates that progeny from a cross where the second parent (male\_parent)
is unknown, is assumed to be from selfing with probability s and
from outcrossing with probability (1-s).
This is appropriate in some forestry tree breeding studies where seed collected
from a tree may have been pollinated by the mother tree or pollinated
by some other tree.
Do not use the
!SELF
qualifier with the
!INBRED
or
!MGS
qualifiers.
!SKIP n
you to skip n header lines at the top of the file.
!SORT
causes ASReml to sort the pedigree into an acceptable order,
that is parents before offspring,
before forming the A-Inverse. The sorted pedigree is written to
a file whose name has
.srt
appended to its name.
A pdf file
pedigree.pdf
contains details of these options.
Genetic groups
If all individuals belong to one genetic group, then use
0
as the
identity of the parents of base individuals. However, if base
individuals belong to various genetic groups this is indicated by the
!GROUP
qualifier and the pedigree file must
begin by identifying these groups. All base individuals should have
group identifiers as parents. In this case the identity
0
will only
appear on the group identity lines, as in the following
example where three sire lines are fitted as genetic groups.
Genetic group example
animal !P
sire 9 !A
dam
lines 2
damage
adailygain
harveyg.ped !ALPHA !MAKE !GROUP 3
harvey.dat
adailygain ~ mu !r animal 02.5 !GU
G1 0 0
G2 0 0
G3 0 0
SIRE_1 G1 G1
SIRE_2 G1 G1
SIRE_3 G1 G1
SIRE_4 G2 G2
SIRE_5 G2 G2
SIRE_6 G3 G3
SIRE_7 G3 G3
SIRE_8 G3 G3
SIRE_9 G3 G3
101 SIRE_1 G1
102 SIRE_1 G1
103 SIRE_1 G1
...
163 SIRE_9 G3
164 SIRE_9 G3
165 SIRE_9 G3
It is usually
appropriate to allocate a genetic group identifier where the parent is unknown.
Return to start