Multivariate Analysis

Introduction

Multivariate analysis is used here in the narrow sense of a multivariate mixed model. There are many other multivariate analysis techniques which are not covered by ASReml. Multivariate analysis is used when we are interested in estimating the correlations between distinct traits (for example, fleece weight and fibre diameter in sheep) and for repeated measures of a single trait.

Repeated measures (rats)

There are two basic forms of analysis of repeated measures data: Random regression type models and multivariate models. The latter described her apply when there are a limited number of repeat measures and they are taken on each subject at the same times so that the data has a multivariate structure.

Wolfinger (1996) summarises a range of variance structures that can be fitted to repeated measures data and demonstrates the models using five weights taken weekly on 27 rats subjected to 3 treatments.

Multiple traits: Wether trial data

Three key traits for the Australian wool industry are the weight of wool grown per year, the cleanness and the diameter of that wool. Much of the wool is produced from wethers and most major producers have traditionally used a particular strain or 'bloodline'. The file wether.as specifies a bivariate analysis.

Model specification

The syntax for specifying a multivariate linear model in ASReml is
Y-variates ~ fixed [ !r random ] [ !f sparse_fixed ]
where

Y-variates is a list of traits,

fixed, random and

sparse_fixed are as in the univariate case but involve the special term Trait and interactions with Trait

The design matrix for Trait has a level (column) for each trait.

Trait by itself fits the mean for each variate,

In an interaction

Trait.Fac fits the factor Fac for each variate and
Trait.Cov fits the covariate Cov for each variate.

ASReml internally rearranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait i.e. nt analysis records ordered Trait within data record. If the data is already in this long form, use the !ASMV t qualifier to indicate that a multivariate analysis is required.

Variance structures

A more sophisticated error structure is required for multivariate analysis. Consider a multivariate analysis with t traits and n units in which the data are ordered traits within units. An typical variance structure is to assume units are independent and traits are correlated. This is described as the direct product of an IDENTITY matrix and an unstructured ( US ) variance matrix.

We discuss the syntax with reference to the following bivariate example

 Orange Wether Trial 1984-8
  SheepID !I
  TRIAL
  BloodLine !I
  TEAM *
  YEAR *
  GFW YLD FDIAM
 wether.dat !skip 1

 GFW FDIAM ~ Trait Trait.YEAR,        # Fixed model
          !r Trait.TEAM Trait.SheepID # Random model

 predict YEAR Trait

 1 2 2                                # Variance header
 1485 0 ID                            # units structure
 Trait 0 US                           # traits structure
  3*0

 Trait.TEAM 2                         # First G header
 Trait 0 US !GP
  3*0
 TEAM 0 ID

 Trait.SheepID 2                      # Second G header
 Trait 0 US !GP
  3*0
 SheepID 0 ID

R-structure

For a standard multivariate analysis

the error (R) structure for the residual must be

specified as two-dimensional with
independent records and
an unstructured variance matrix across traits;

records may have observations missing in different patterns and

these are handled internally during analysis,

the R structure must be ordered

traits within units, that is, the R structure definition line for units must be specified before the line for Trait ,

variance parameters are variances

not variance ratios,

the R structure definition line for units,

that is,
     1485 0 ID, could be replaced by
     0 or
     0 0 ID ; this tells ASReml to fill in the number of units and is a useful option when the exact number of units in the data is not known to the user,

the error variance matrix for traits is specified by the model

     Trait 0 US
     3 * 0
Three initial values for the matrix are required being the lower triangle of the (symmetric) matrix specified row-wise.
Finding reasonable initial values can be a problem. If initial values are written on the next line in the form      q * 0 where q is t(t + 1)/2 and t is the number of traits, as in the example,
ASReml will take half of the phenotypic variance matrix of the data as an initial value.

!ASUV and !ASMV

These special qualifiers relating to multivariate analysis allow for the situation when

!ASUV: the data is in a multivariate layout but some residual variance structure other than IDENTITY cross US is required.

!ASMV t : the data (file) is already in an expanded form (n sets of t records and the multivariate residual variance structure IDENTITY cross US IS required.

To use an error structure other than

US for the residual stratum you must (also) specify !ASUV on the datafile line and include mv in the model if there are missing values,

To perform a multivariate analysis (including the automatic

handling of missing values) when the data have already been expanded use !ASMV t on the datafile line. t is the number of traits that ASReml should expect, the data file must have t records for each multivariate record although some may be coded missing.

G-structure

For a standard multivariate analysis, a US structure is also used for the between trait variance matrix of the random terms (as in the example). However, other structured models may be used and may be necessary when there are more traits as it is not unusual for there not to be a positive definite solution for US matrices. Note the use of !GP to request the estimated matrix be constrained to be positive definite, and
the use of 3*0 in lieu of estimates of initial values; ASReml again substitutes a proportion of the observed variance covariance matrix of the data.

Example

Below is the output returned in the .asr file for this analysis, except that the !GO qualifiers were omitted.

  ASReml 1.63o [01 Jun 2005]  Orange Wether Trial  1984-88
      Build: j [01 Jul 2005]  32 bit
  13 Jul 2005 09:38:00.928   32.00 Mbyte Windows   wether
  Licensed to: Arthur Gilmour

  Folder: C:\data\asr\UG2\manex
   TAG  !I
   BloodLine !I
  QUALIFIERS: !SKIP 1
  Reading wether.dat  FREE FORMAT skipping     1 lines

  Bivariate analysis of GFW and FDIAM
  Using     1485 records of    1485 read
   Model term                  Size #miss #zero   MinNon0    Mean      MaxNon0
    1 TAG                       521     0     0      1   261.0956        521
    2 TRIAL                             0     0  3.000      3.000      3.000
    3 BloodLine                  27     0     0      1    13.4323         27
    4 TEAM                       35     0     0      1    18.0067         35
    5 YEAR                        3     0     0      1     2.0391          3
    6 GFW                  Variate      0     0  4.100      7.478      11.20
    7 YLD                               0     0  60.30      75.11      88.60
    8 FDIAM                Variate      0     0  15.90      22.29      30.60
    9 Trait                       2
   10 Trait.YEAR                  6  9 Trait     :   2   5 YEAR           :    3
   11 Trait.TEAM                 70  9 Trait     :   2   4 TEAM           :   35
   12 Trait.TAG                1042  9 Trait     :   2   1 TAG            :  521
    1485  identity
       2  UnStructure    0.2000    0.2000    0.4000
     2970 records assumed sorted    2 within    1485
       2  UnStructure    0.4000    0.3000    1.3000
      35  identity
  Structure for Trait.TEAM         has      70 levels defined
       2  UnStructure    0.2000    0.2000    2.0000
     521  identity
  Structure for Trait.TAG          has    1042 levels defined
  Forming    1120 equations:   8 dense.
  Initial updates will be shrunk by factor    0.316
  Notice: Algebraic ANOVA Denominator DF calculation is not available
          Empirical derivatives will be used.
  NOTICE:      2 singularities detected in design matrix.
    1 LogL=-886.521     S2=  1.0000       2964 df
    2 LogL=-818.508     S2=  1.0000       2964 df
    3 LogL=-755.911     S2=  1.0000       2964 df
    4 LogL=-725.374     S2=  1.0000       2964 df
    5 LogL=-723.475     S2=  1.0000       2964 df
    6 LogL=-723.462     S2=  1.0000       2964 df
    7 LogL=-723.462     S2=  1.0000       2964 df
    8 LogL=-723.462     S2=  1.0000       2964 df

  Source                Model  terms     Gamma     Component    Comp/SE
 \verb
  Residual            UnStru   2   1  0.128890      0.128890      12.40   0 U
  Residual            UnStru   2   2  0.440601      0.440601      21.93   0 U
  Trait.TEAM          UnStru   1   1  0.374493      0.374493       3.89   0 U
  Trait.TEAM          UnStru   2   1  0.388740      0.388740       2.60   0 U
  Trait.TEAM          UnStru   2   2   1.36533       1.36533       3.74   0 U
  Trait.TAG           UnStru   1   1  0.257159      0.257159      12.09   0 U
  Trait.TAG           UnStru   2   1  0.219557      0.219557       5.55   0 U
  Trait.TAG           UnStru   2   2   1.92082       1.92082      14.35   0 U
  Covariance/Variance/Correlation Matrix UnStructured
 0.4360 is the correlation Trait.TEAM
  0.1984     0.4360
  0.1289     0.4406
  Covariance/Variance/Correlation Matrix UnStructured
  0.3745     0.5436
  0.3887      1.365
  Covariance/Variance/Correlation Matrix UnStructured
  0.2572     0.3124
  0.2196      1.921

                                    Wald F statistics
      Source of Variation           NumDF     DenDF    F-inc             Prob
    9 Trait                             2      33.0  5761.58            <.001
   10 Trait.YEAR                        4    1162.2  1094.90            <.001
  Notice: The DenDF values are calculated ignoring fixed/boundary/singular
              variance parameters using empirical derivatives.

                      Solution       Standard Error    T-value     T-prev
   10 Trait.YEAR
                     2  -0.102262       0.290190E-01     -3.52
                     3    1.06636       0.290831E-01     36.67     42.07
                     5    1.17407       0.433905E-01     27.06
                     6    2.53439       0.434880E-01     58.28     32.85
    9 Trait
                     1    7.13717       0.107933         66.13
                     2    21.0569       0.209095        100.71     78.16
   11 Trait.TEAM                           70 effects fitted
   12 Trait.TAG                          1042 effects fitted
  SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section   1
    1.00   1.54
           10  possible outliers: see .res file
  Finished: 13 Jul 2005 09:38:05.725   LogL Converged

Return to start