Multivariate animal genetics data - Sheep

Introduction

The analysis of incomplete or unbalanced multivariate data often presents computational difficulties. These difficulties are exacerbated by either the number of random effects in the linear mixed model, the number of traits, the complexity of the variance models being fitted to the random effects or the size of the problem. In this section we illustrate two approaches to the analysis of a complex set of incomplete multivariate data.

Much of the difficulty in conducting such analyses in ASReml centres on obtaining good starting values. Derivative based algorithms such as the AI algorithm can be unreliable when fitting complex variance structures unless good starting values are available. Poor starting values may result in divergence of the algorithm or slow convergence. A particular problem with fitting unstructured variance models is keeping the estimated variance matrix positive definite. These are not simple issues and in the following we present a pragmatic approach to them.

The data are taken from a large genetic study on Coopworth lambs. A total of 5 traits, namely weaning weight ( wwt), yearling weight ( ywt), greasy fleece weight ( gfw), fibre diameter ( fdm) and ultrasound fat depth at the C site ( fat) were measured on 7043 lambs. The lambs were the progeny of 92 sires and 3561 dams, produced from 4871 litters over 49 flock-year combinations. Not all traits were measured on each group. No pedigree data was available for either sires or dams.

The aim of the analysis is to estimate heritability (h2) of each trait and to estimate the genetic correlations between the five traits. We will present two approaches, a half-sib analysis and an analysis based on the use of an animal model, which directly defines the genetic covariance between the progeny and sires and dams.

The data fields included factors defining sire, dam and lamb ( tag), covariates such as age, the age of the lamb at a set time, brr the birth rearing rank (1 = born single raised single, 2 = born twin raised single, 3 = born twin raised twin and 4 = other), sex (M, F) and grp a factor indicating the flock-year combination.

Half-sib analysis

In the half-sib analysis we include terms for the random effects of sires, dams and litters. In univariate analyses the variance component for sires is denoted by σ2s= 0.25σ2A where σ2A is the additive genetic variance, the variance component for dams is denoted by σ2d= 0.25σ2A+ σ2m where σ2m is the maternal variance component and the variance component for litters is denoted by σ2l and represents variation attributable to the particular mating. For a multivariate analysis these variance components for sires, dams and litters are, in theory replaced by unstructured matrices, one for each term. Additionally we assume the residuals for each trait may be correlated. Thus for this example we would like to fit a total of 4 unstructured variance models. For such a situation, it is sensible to commence the modelling process with a series of univariate analyses. These give starting values for the diagonals of the variance matrices, but also indicate what variance components are estimable. The ASReml job for the univariate analyses is
 Multivariate Sire  Dam model
   tag
   sire   92 !I
   dam  3561 !I
   grp    49
   sex
   brr     4
   litter 4871
   age       wwt   !M0 ywt   !M0    # !M0 recodes zeros as missing values
   gfw   !M0 fdm   !M0 fat   !M0
 coop.fmt
 wwt ~ mu age brr sex age.sex !r sire dam lit age.grp sex.grp !f grp
Table 1. REML estimates of a subset of the variance parameters for each trait for the genetic example, expressed as a ratio to their asymptotic s.e. term wwt ywt gfw fdm fat sire 3.68 3.57 3.95 1.92 1.92 dam 6.25 4.93 2.78 0.37 0.05 litter 8.79 0.99 2.23 1.91 0.00 age.grp 2.29 1.39 0.31 1.15 1.74 sex.grp 2.90 3.43 3.70 - 1.83

Wald tests of the fixed effects for each trait for the genetic example,
term wwt ywt gfw fdm fat
age 331.3 67.1 52.4 2.6 7.5
brr 554.6 73.4 14.9 0.3 13.9
sex 196.1 123.3 0.2 2.9 0.6
age.sex 10.3 1.7 1.9 - 5.0

Tables 1 and 2 present the summary of these analyses. Fibre diameter was measured on only 2 female lambs and so interactions with sex were not fitted. The dam variance component was quite small for both fibre diameter and fat. The REML estimate of the variance component associated with litters was effectively zero for fat.

Thus in the multivariate analysis we consider fitting the following models to the sire, dam and litter effects,

Var(us) = Σs cross I92
Var(ud) = Σd cross I3561
Var(ul) = Σl cross I4891
where Σs (5 by 5), Σd (3 by 3) and Σl (4 by 4) are positive definite symmetric matrices corresponding to the between traits variance matrices for sires, dams and litters respectively. The variance matrix for dams does not involve fibre diameter and fat depth, while the variance matrix for litters does not involve fat depth. The effects in each of the above vectors are ordered levels within traits. Lastly we assume that the residual variance matrix is given by
Σe cross I7043

Table 3 presents the sequence variance models fitted to each of the four random terms sire, dam, litter and error in the ASReml job
 Multivariate Sire  Dam model
   tag
   sire   92 !I
   dam  3561 !I
   grp    49
   sex
   brr     4
   litter 4871
   age       wwt   !m0 ywt   !m0    # !M0 identifies missing values
   gfw   !m0 fdm   !m0 fat   !m0
 coop.fmt   !DOPATH $1 !CONTINUE !MAXIT 20
 !PATH 3
  !EXTRA 4
 !PATH
 wwt ywt gfw fdm fat ~ Trait Tr.age Tr.brr Tr.sex Tr.age.sex,
       !r Tr.sire,
       !{ at(Tr,1).dam at(Tr,2).dam at(Tr,3).dam !},
       !{ at(Tr,1).lit at(Tr,2).lit at(Tr,3).lit at(Tr,4).lit !},
          at(Trait,1).age.grp .0024,
          at(Trait,2).age.grp .0019,
          at(Trait,4).age.grp .0020,
          at(Trait,5).age.grp .00026,
          at(Trait,1).sex.grp .93,
          at(Trait,2).sex.grp 16.0,
          at(Trait,3).sex.grp .28,
          at(Trait,5).sex.grp 1.18,
  !f Tr.grp

 1 2 3                    #1 R structure with 2 dimensions and 3 G structures

 0 0 0                    #Independent across animals

 Tr 0 US                  #General structure across traits
 15*0.                    #Asreml will estimate some starting values

 Tr.sire 2                #Sire effects.

 !PATH 1                  #Initial analysis ignoring genetic correlations
 Tr 0 DIAG                #Specified diagonal variance structure
 0.608 1.298 0.015 0.197 0.035      #Initial sire variances

 !PATH 2                  #Factor Analytic model
 Tr 0 FA1 !GP
    0.5 0.5 -.01 -.01 0.1           #Correlation factors
 0.608 1.298 0.015 0.197 0.035      #Variances

 !PATH 3                  #Unstructured variance model
 Tr 0 US
 0.6199                   #Lower triangle row-wise
  0.6939      1.602
  0.003219 0.007424 0.01509
 -0.02532 -0.05840 -0.0002709 0.1807
  0.06013 0.1387     0.0006433 -0.005061 0.03487
 !PATH
 sire

 #Maternal structure covers the 3 model terms
 #               at(Tr,1).dam at(Tr,2).dam at(Tr,3).dam

 at(Tr,1).dam 2                  # Maternal effects.
 !PATH 1
 3 0 CORGH !GU                   # Equivalent to Unstructured
 .9
 .1 .1
 2.2 4.14 0.018
 !PATH 2
 3 0 CORGH !GU
 .9
 .1 .1
 2.2 4.14 0.018
 !PATH 3
 3 0 US !GU
 .9
 .1 .1
 2.2 4.14 0.018
 !PATH
 dam

 #Litter structure covers the 4 model terms  at(Tr,1).lit at(Tr,2).lit
 #at(Tr,3).lit at(Tr,4).lit

 at(Tr,1).lit 2                  # Litter effects.
 !PATH 1
 4 0 DIAG                        # Diagonal structure
 3.74 0.97 0.019 0.941
 !PATH 2
 4 0 FA1 !GP                     # Factor Analytic 1
 .5 .5 .01 .1
 4.95 4.63 0.037 0.941
 !PATH 3
 4 0 US                          # Unstructured
  5.073
   3.545      3.914
  0.1274     0.08909 0.02865
  0.07277 0.05090 0.001829  1.019

 !PATH
 lit
Table 3. Variance models fitted for each part of the ASReml job in the analysis of the genetic example
term matrix !PATH 1 !PATH 2 !PATH 3
sire Σs DIAG FA1 US
dam Σd CORGH CORGH US
litter Σl DIAG FA1 US
error Σe US US US

In !PATH 1, the error variance model is taken to be unstructured, but the starting values are set to zero. This instructs ASReml to obtain starting values from the sample covariance matrix of the data. For incomplete data the matrix so obtained may not, in general be positive definite. Care should be taken when using this option for incomplete multivariate data. The command to run !PATH 1 is
 asreml -nrw64 mt 1
The Loglikelihood from this run is -20000-1444.93. When the job runs, the message
Non positive definite G matrix: 0 singularities 1 negative pivots; order 3
appears to the screen. This refers to the 3 by 3 dam matrix which is estimated as
  Covariance/Variance/Correlation Matrix CORRelation
   2.573      1.025     0.6568
   3.024      3.382     0.7830
  0.1526     0.2086     0.2098E-01
Note the correlation between wwt and ywt is estimated at 1.025.
The results from this analysis can be automatically used by ASReml for the next part, if the .rsv is copied prior to running the next part. That is, we add the !PATH 2 coding to the job, copy mt1.rsv to mt2.rsv so that when we run !PATH 2 it starts from where !PATH 1 finished, and run the job using
 asreml -cnrw64 mt 2
The Loglikelihood from this run is -20000-1427.37.

Finally, we use the !PATH 3 coding to obtain the final analysis, copy mt2.rsv to mt3.rsv and run the final stage starting from the stage 2 results. Note that we are using the automatic updating associated with !CONTINUE. A portion of the final output file is
  Notice: LogL values are reported relative to a base of      -20000.00
  NOTICE:     76 singularities detected in design matrix.
    1 LogL=-1427.37     S2=  1.0000      35006 df : 2 components constrained
    2 LogL=-1424.58     S2=  1.0000      35006 df
    3 LogL=-1421.07     S2=  1.0000      35006 df : 1 components constrained
    4 LogL=-1420.11     S2=  1.0000      35006 df
    5 LogL=-1419.93     S2=  1.0000      35006 df
    6 LogL=-1419.92     S2=  1.0000      35006 df
    7 LogL=-1419.92     S2=  1.0000      35006 df
    8 LogL=-1419.92     S2=  1.0000      35006 df
    9 LogL=-1419.92     S2=  1.0000      35006 df
   10 LogL=-1419.92     S2=  1.0000      35006 df
   11 LogL=-1419.92     S2=  1.0000      35006 df

  Source                Model  terms     Gamma     Component    Comp/SE   % C
  at(Trait,1).age.grp      49     49  0.135360E-02  0.135360E-02   2.03   0 P
  at(Trait,2).age.grp      49     49  0.101561E-02  0.101561E-02   1.24   0 P
  at(Trait,4).age.grp      49     49  0.176505E-02  0.176505E-02   1.13   0 P
  at(Trait,5).age.grp      49     49  0.209279E-03  0.209279E-03   1.68   0 P
  at(Trait,1).sex.grp      49     49  0.919610      0.919610       2.89   0 P
  at(Trait,2).sex.grp      49     49   15.3912       15.3912       3.50   0 P
  at(Trait,3).sex.grp      49     49  0.279496      0.279496       3.71   0 P
  at(Trait,5).sex.grp      49     49   1.44032       1.44032       1.80   0 P
  Residual            UnStru   1   1   9.46220       9.46220      33.30   0 U
   :                     :                :             :           :
  Covariance/Variance/Correlation Matrix UnStructured Residual
   9.462     0.5691     0.2356     0.1640     0.2183
   7.332      17.54     0.4241     0.2494     0.4639
  0.2728     0.6686     0.1417     0.3994     0.1679
  0.9625      1.994     0.2870      3.642     0.4875E-01
  0.8336      2.412     0.7846E-01 0.1155      1.541
  Covariance/Variance/Correlation Matrix UnStructured Tr.sire
  0.5941     0.7044     0.2966     0.2032     0.2703
  0.6745      1.544     0.1364E-01-0.1224     0.5726
  0.2800E-01 0.2076E-02 0.1500E-01 0.1121    -0.4818E-02
  0.6238E-01-0.6056E-01 0.5469E-02 0.1586    -0.6331
  0.3789E-01 0.1294    -0.1073E-03-0.4586E-01 0.3308E-01
  Covariance/Variance/Correlation Matrix UnStructured at(Tr,1).dam
   2.161      1.010     0.7663
   2.196      2.186     0.8301
  0.1577     0.1718     0.1959E-01
  Covariance/Variance/Correlation Matrix UnStructured at(Tr,1).lit
   3.547     0.5065    -0.1099    -0.4096E-01
   1.555      2.657     0.1740    -0.5150
 -0.2787E-01 0.3821E-01 0.1815E-01-0.3282
 -0.7312E-01-0.7957    -0.4191E-01 0.8984

                                    Wald F statistics
      Source of Variation           NumDF              F-inc
   15 Tr.age                            5              98.95
   16 Tr.brr                           15             116.72
   17 Tr.sex                            5              59.78
   19 Tr.age.sex                        4               4.90
In the .res file is reported an eigen analysis of these four variance structures.
  Eigen Analysis of UnStructured matrix for Residual
  Eigen values    22.458     5.210     3.395     1.160     0.103
    Percentage    69.474    16.118    10.502     3.588     0.318
       1          0.4970   -0.8663    0.0141    0.0470    0.0027
       2          0.8509    0.4765   -0.1316   -0.1746   -0.0327
       3          0.0335    0.0230    0.0585   -0.0048    0.9974
       4          0.1168    0.0871    0.9843    0.0769   -0.0633
       5          0.1187    0.1196   -0.1010    0.9805    0.0039

  Eigen Analysis of UnStructured matrix for Tr.sire
  Eigen values     1.904     0.304     0.114     0.013     0.010
    Percentage    81.199    12.963     4.859     0.535     0.444
       1          0.4578    0.7476    0.4695   -0.1052    0.0087
       2          0.8860   -0.3646   -0.2766    0.0248   -0.0700
       3          0.0077    0.0798    0.0826    0.9438   -0.3098
       4         -0.0163    0.5260   -0.8015    0.1116    0.2612
       5          0.0710   -0.1587    0.2320    0.2918    0.9115

  Eigen Analysis of UnStructured matrix for at(Tr,1).dam
  Eigen values     4.382     0.010    -0.025
    Percentage   100.352     0.225    -0.577
       1          0.7041   -0.2321    0.6711
       2          0.7081    0.1585   -0.6881
       3          0.0533    0.9597    0.2760

  Eigen Analysis of UnStructured matrix for at(Tr,1).lit
  Eigen values     4.795     1.827     0.482     0.016
    Percentage    67.345    25.664     6.769     0.221
       1          0.7752    0.5928    0.2178    0.0133
       2          0.6159   -0.6328   -0.4691   -0.0106
       3          0.0016   -0.0340    0.0255    0.9991
       4         -0.1403    0.4969   -0.8555    0.0390
The REML estimates of all the variance matrices except for the dam components are positive definite. Heritabilities for each trait can be calculated using the .pin file facility of ASReml. The heritability is given by
h2 = σ2A2P
where σ2P is the phenotypic variance and is given by
σ2P= σ2s+ σ2d+ σ2l+ σ2e
recalling that
σ2s = 0.25 σ2A
σ2d = 0.25 σ2A + σ2m

In the half-sib analysis we only use the estimate of additive genetic variance from the sire variance component. The ASReml .pin file is presented below along with the output from the following command
 asreml -p mt3

 F phenWYG 9:14 + 24:29 + 39:44 + 45:50    # defines 55:60
 F phenD  15:18 + 30:33 +         51:54    # defines 61:64
 F phenF  19:23 + 34:38                    # defines 65:69
 F Direct 24:38 * 4.                       # defines 70:84
 F Maternal 39:44 - 24:29                  # defines 85:90
 H WWTh2  70 55
 H YWTh2  72 57
 H GFWh2  75 60
 H FDMh2  79 64
 H FATh2  84 69
 R GenCor 24:38
 R MatCor 85:90

   55 phenWYG  9     15.76      0.3130
   56 phenWYG 10     11.76      0.3749
   57 phenWYG 11     23.92      0.6313
    .    .     .      .          .
   70 Direct 24      2.376      0.6458
   71 Direct 25      2.698      0.8487
   72 Direct 26      6.174       1.585
   73 Direct 27     0.1120      0.7330E-01
    .   .     .        .          .
   85 Maternal 39    1.567      0.3788
   86 Maternal 40    1.521      0.4368
   87 Maternal 41   0.6419      0.7797
   .    .       .     .            .
  WWTh2      = Direct 2  70/phenWYG   55=        0.1507  0.0396
  YWTh2      = Direct 2  72/phenWYG   57=        0.2581  0.0624
  GFWh2      = Direct 2  75/phenWYG   60=        0.3084  0.0716
  FDMh2      = Direct 3  79/phenD 18  64=        0.1350  0.0717
  FATh2      = Direct 3  84/phenF 23  69=        0.0841  0.0402
  GenCor 2 1 = Tr.si 25/SQR[Tr.si 24*Tr.si 26]=  0.7044  0.1025
  GenCor 3 1 = Tr.si 27/SQR[Tr.si 24*Tr.si 29]=  0.2966  0.1720
  GenCor 3 2 = Tr.si 28/SQR[Tr.si 26*Tr.si 29]=  0.0136  0.1810
  GenCor 4 1 = Tr.si 30/SQR[Tr.si 24*Tr.si 33]=  0.2028  0.3513
  GenCor 4 2 = Tr.si 31/SQR[Tr.si 26*Tr.si 33]= -0.1227  0.3247
  GenCor 4 3 = Tr.si 32/SQR[Tr.si 29*Tr.si 33]=  0.1115  0.3868
  GenCor 5 1 = Tr.si 34/SQR[Tr.si 24*Tr.si 38]=  0.2703  0.2724
  GenCor 5 2 = Tr.si 35/SQR[Tr.si 26*Tr.si 38]=  0.5726  0.2022
  GenCor 5 3 = Tr.si 36/SQR[Tr.si 29*Tr.si 38]= -0.0048  0.2653
  GenCor 5 4 = Tr.si 37/SQR[Tr.si 33*Tr.si 38]= -0.6333  0.3775
  MatCor 2 1 = Mater 86/SQR[Mater 85*Mater 87]=  1.5168  0.7131
  MatCor 3 1 = Mater 88/SQR[Mater 85*Mater 90]=  1.5285  1.1561
  MatCor 3 2 = Mater 89/SQR[Mater 87*Mater 90]=  3.1251  2.7985
  • Back

    Return to start