Multivariate Analysis
Context: Linear Model specification
Introduction
Multivariate analysis is used here in
the narrow sense of a multivariate mixed model. There are many
other multivariate analysis techniques which are not covered by
ASReml. Multivariate analysis is used when we are interested in
estimating the correlations
between distinct traits (for example, fleece weight
and fibre diameter in sheep) and for repeated measures of a single
trait.
Repeated measures (rats)
There are two basic forms of analysis of repeated measures data:
Random regression type models and multivariate models. The latter described
her apply when there are a limited number of repeat measures and they are
taken on each subject at the same times so that the data has a multivariate
structure.
Wolfinger (1996)
summarises a range of variance structures that can be
fitted to repeated measures data and demonstrates the models using
five weights taken weekly on 27 rats subjected to 3 treatments.
Multiple traits: Wether trial data
Three key traits for the Australian wool industry
are the weight of wool grown per year, the cleanness and the diameter of
that wool. Much of the wool is produced from wethers and most
major producers have traditionally used a particular strain or
'bloodline'. The file
wether.as
specifies a bivariate analysis.
Model specification
The syntax for specifying a multivariate linear model in ASReml is
Y-variates ~ fixed [ !r random ] [ !f sparse_fixed ]
where
Y-variates is a list of traits,
fixed, random and
sparse_fixed are as in the
univariate case
but involve the special term
Trait
and interactions with
Trait
The design matrix
for
Trait
has a level (column) for each trait.
Trait
by itself fits the mean for each variate,
In an interaction
Trait.Fac
fits the factor
Fac
for each variate and
Trait.Cov
fits the covariate
Cov
for each variate.
ASReml internally rearranges the data
so that n data records containing t traits each
becomes n sets of
t analysis records
indexed by the internal factor
Trait
i.e. nt analysis records
ordered
Trait
within data record. If the data is already in this
long form, use the
!ASMV t
qualifier to indicate that a multivariate
analysis is required.
Variance structures
A more sophisticated error structure is required for multivariate
analysis.
Consider a
multivariate analysis with t traits and n units in which the
data are ordered traits within units.
An typical variance structure is to assume units are
independent and traits are correlated. This is
described as the direct product of an
IDENTITY
matrix and an unstructured (
US
) variance matrix.
We discuss the syntax with reference to the following bivariate example
Orange Wether Trial 1984-8
SheepID !I
TRIAL
BloodLine !I
TEAM *
YEAR *
GFW YLD FDIAM
wether.dat !skip 1
GFW FDIAM ~ Trait Trait.YEAR, # Fixed model
!r Trait.TEAM Trait.SheepID # Random model
predict YEAR Trait
1 2 2 # Variance header
1485 0 ID # units structure
Trait 0 US # traits structure
3*0
Trait.TEAM 2 # First G header
Trait 0 US !GP
3*0
TEAM 0 ID
Trait.SheepID 2 # Second G header
Trait 0 US !GP
3*0
SheepID 0 ID
R-structure
For a standard multivariate analysis
the error (R) structure for the residual must be
specified as two-dimensional with
independent records and
an
unstructured variance matrix
across traits;
records may have observations missing in different patterns and
these are handled internally during analysis,
the R structure must be ordered
traits within units, that is, the R structure definition line for
units must be specified before the line for
Trait
,
variance parameters are variances
not variance ratios,
the R structure definition line for units,
that is,
1485 0
ID,
could be replaced by
0
or
0 0 ID
; this tells ASReml to
fill in the number of units and is a useful option when the exact
number of units in the data is not known to the user,
the error variance matrix for traits is specified by the model
Trait 0 US
3 * 0
Three initial values for the matrix are required being
the lower triangle of the (symmetric)
matrix specified row-wise.
Finding reasonable initial values can be
a problem. If initial values are written on the next line in the form
q * 0
where
q
is t(t + 1)/2 and
t is the number of traits, as in the example,
ASReml will take half of the phenotypic variance
matrix of the data as an initial value.
!ASUV and !ASMV
These special qualifiers relating to
multivariate analysis allow for the situation when
!ASUV:
the data is in a multivariate layout but some residual variance structure
other than
IDENTITY
cross
US
is required.
!ASMV t :
the data (file) is already in an expanded form (n
sets of t records and the multivariate residual
variance structure
IDENTITY
cross
US
IS required.
To use an error structure other than
US
for
the residual stratum you must (also) specify
!ASUV
on the
datafile line
and include
mv
in the model if there are missing values,
To perform a multivariate analysis (including the automatic
handling of missing values) when the data
have already been expanded use
!ASMV t
on the
datafile line.
t is the number of traits that ASReml should expect,
the data file must
have t records for each multivariate record although some may
be coded missing.
G-structure
For a standard multivariate analysis, a
US
structure is also used for the between trait variance matrix
of the random terms (as in the example). However,
other structured models may be used and may be necessary when there
are more traits as it is not unusual for there not to be a positive
definite solution for
US
matrices.
Note the use of
!GP
to request the estimated matrix be constrained to be positive definite, and
the use of
3*0
in lieu of estimates of initial values; ASReml again substitutes
a proportion of the observed variance covariance matrix of the data.
Example
Below is the output returned in the
.asr
file for this
analysis, except that the
!GO
qualifiers were omitted.
ASReml 1.63o [01 Jun 2005] Orange Wether Trial 1984-88
Build: j [01 Jul 2005] 32 bit
13 Jul 2005 09:38:00.928 32.00 Mbyte Windows wether
Licensed to: Arthur Gilmour
Folder: C:\data\asr\UG2\manex
TAG !I
BloodLine !I
QUALIFIERS: !SKIP 1
Reading wether.dat FREE FORMAT skipping 1 lines
Bivariate analysis of GFW and FDIAM
Using 1485 records of 1485 read
Model term Size #miss #zero MinNon0 Mean MaxNon0
1 TAG 521 0 0 1 261.0956 521
2 TRIAL 0 0 3.000 3.000 3.000
3 BloodLine 27 0 0 1 13.4323 27
4 TEAM 35 0 0 1 18.0067 35
5 YEAR 3 0 0 1 2.0391 3
6 GFW Variate 0 0 4.100 7.478 11.20
7 YLD 0 0 60.30 75.11 88.60
8 FDIAM Variate 0 0 15.90 22.29 30.60
9 Trait 2
10 Trait.YEAR 6 9 Trait : 2 5 YEAR : 3
11 Trait.TEAM 70 9 Trait : 2 4 TEAM : 35
12 Trait.TAG 1042 9 Trait : 2 1 TAG : 521
1485 identity
2 UnStructure 0.2000 0.2000 0.4000
2970 records assumed sorted 2 within 1485
2 UnStructure 0.4000 0.3000 1.3000
35 identity
Structure for Trait.TEAM has 70 levels defined
2 UnStructure 0.2000 0.2000 2.0000
521 identity
Structure for Trait.TAG has 1042 levels defined
Forming 1120 equations: 8 dense.
Initial updates will be shrunk by factor 0.316
Notice: Algebraic ANOVA Denominator DF calculation is not available
Empirical derivatives will be used.
NOTICE: 2 singularities detected in design matrix.
1 LogL=-886.521 S2= 1.0000 2964 df
2 LogL=-818.508 S2= 1.0000 2964 df
3 LogL=-755.911 S2= 1.0000 2964 df
4 LogL=-725.374 S2= 1.0000 2964 df
5 LogL=-723.475 S2= 1.0000 2964 df
6 LogL=-723.462 S2= 1.0000 2964 df
7 LogL=-723.462 S2= 1.0000 2964 df
8 LogL=-723.462 S2= 1.0000 2964 df
Source Model terms Gamma Component Comp/SE
\verb
Residual UnStru 2 1 0.128890 0.128890 12.40 0 U
Residual UnStru 2 2 0.440601 0.440601 21.93 0 U
Trait.TEAM UnStru 1 1 0.374493 0.374493 3.89 0 U
Trait.TEAM UnStru 2 1 0.388740 0.388740 2.60 0 U
Trait.TEAM UnStru 2 2 1.36533 1.36533 3.74 0 U
Trait.TAG UnStru 1 1 0.257159 0.257159 12.09 0 U
Trait.TAG UnStru 2 1 0.219557 0.219557 5.55 0 U
Trait.TAG UnStru 2 2 1.92082 1.92082 14.35 0 U
Covariance/Variance/Correlation Matrix UnStructured
0.4360 is the correlation Trait.TEAM
0.1984 0.4360
0.1289 0.4406
Covariance/Variance/Correlation Matrix UnStructured
0.3745 0.5436
0.3887 1.365
Covariance/Variance/Correlation Matrix UnStructured
0.2572 0.3124
0.2196 1.921
Wald F statistics
Source of Variation NumDF DenDF F-inc Prob
9 Trait 2 33.0 5761.58 <.001
10 Trait.YEAR 4 1162.2 1094.90 <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using empirical derivatives.
Solution Standard Error T-value T-prev
10 Trait.YEAR
2 -0.102262 0.290190E-01 -3.52
3 1.06636 0.290831E-01 36.67 42.07
5 1.17407 0.433905E-01 27.06
6 2.53439 0.434880E-01 58.28 32.85
9 Trait
1 7.13717 0.107933 66.13
2 21.0569 0.209095 100.71 78.16
11 Trait.TEAM 70 effects fitted
12 Trait.TAG 1042 effects fitted
SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section 1
1.00 1.54
10 possible outliers: see .res file
Finished: 13 Jul 2005 09:38:05.725 LogL Converged
Return to start