Paired Case-Control study - Rice worms

Introduction

This data is concerned with an experiment conducted to investigate the tolerance of rice varieties to attack by the larvae of bloodworms. The data have been kindly provided by Dr. Mark Stevens, Yanco Agricultural Institute. A full description of the experiment is given by Stevens et al. (1999). Bloodworms are a significant pest of rice in the Murray and Murrumbidgee irrigation areas where they can cause poor establishment and substantial yield loss.

The experiment commenced with the transplanting of rice seedlings into trays. Each tray contained 32 seedlings and the trays were paired so that a control tray (no bloodworms) and a treated tray (bloodworms added) were grown in a controlled environment room for the duration of the experiment. At the end of this time rice plants were carefully extracted, the root system washed and root area determined for the tray using an image analysis system described by Stevens et al. (1999). Two pairs of trays, each pair corresponding to a different variety, were included in each run. A new batch of bloodworm larvae was used for each run. A total of 44 varieties was investigated with three replicates of each. Unfortunately the variety concurrence within runs was less than optimal. Eight varieties occurred with only one other variety, 22 with two other varieties and the remaining 14 with three different varieties.

In the next three sections we present an exhaustive analysis of these data using equivalent univariate and multivariate techniques. It is convenient to use two data files one for each approach. The univariate data file consists of factors pair, run, variety, tmt, unit and variate rootwt. The factor unit labels the individual trays, pair labels pairs of trays (to which varieties are allocated) and tmt is the two level bloodworm treatment factor (control/treated). The multivariate data file consists of factors variety and run and variates for root weight of both the control and exposed treatments (labelled yc and ye respectively).

Preliminary analyses indicated variance heterogeneity so that subsequent analyses were conducted on the square root scale. Figure 1 presents a plot of the treated and the control root area (on the square root scale) for each variety. There is a strong dependence between the treated and control root area, which is not surprising. The aim of the experiment was to determine the tolerance of varieties to bloodworms and thence identify the most tolerant varieties. The definition of tolerance should allow for the fact that varieties differ in their inherent seedling vigour (Figure 1). The original approach of the scientist was to regress the treated root area against the control root area and define the index of vigour as the residual from this regression. This approach is clearly inefficient since there is error in both variables. We seek to determine an index of tolerance from the joint analysis of treated and control root area.
Figure 1. Rice bloodworm data: Plot of square root of root weight for treated versus control

Standard analysis

The allocation of bloodworm treatments within varieties and varieties within runs defines a nested block structure of the form
 run/variety/tmt = run + run.variety + run.variety.tmt
               ( = run + pair + pair.tmt )
               ( = run + run.variety + units )
There is an additional blocking term, however, due to the fact that the bloodworms within a run are derived from the same batch of larvae whereas between runs the bloodworms come from different sources. This defines a block structure of the form
 run/tmt/variety = run + run.tmt + run.tmt.variety
               ( = run + run.tmt + pair.tmt )
Combining the two provides the full block structure for the design, namely
   run + run.variety + run.tmt + run.tmt.variety
 = run + run.variety + run.tmt + units
 = run + pair + run.tmt + pair.tmt
In line with the aims of the experiment the treatment structure comprises variety and treatment main effects and treatment by variety interactions. In the traditional approach the terms in the block structure are regarded as random and the treatment terms as fixed. The choice of treatment terms as fixed or random depends largely on the aims of the experiment. The aim of this example is to select the "best" varieties. The definition of best is somewhat more complex since it does not involve the single trait sqrt(rootwt) but rather two traits, namely sqrt(rootwt) in the presence/absence of bloodworms. Thus to minimise selection bias the variety main effects and thence the tmt.variety interactions are taken as random. The main effect of treatment is fitted as fixed to allow for the likely scenario that rather than a single population of treatment by variety effects there are in fact two populations (control and treated) with a different mean for each. There is evidence of this prior to analysis with the large difference in mean sqrt(rootwt) for the two groups (14.93 and 8.23 for control and treated respectively). The inclusion of tmt as a fixed effect ensures that BLUPs of tmt.variety effects are shrunk to the correct mean (treatment means rather than an overall mean).

The model for the data is given by
y = X τ + Z1u1+ Z2u2+ Z3u3 + Z4u4+ Z5u5+ e Eqn (1)
where y is a vector of length n=264 containing the sqrt(rootwt) values, τ corresponds to a constant term and the fixed treatment contrast and u1... u5 correspond to random variety, by variety, run, treatment by run and variety by run effects. The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures Var(ui)2iIbi (where bi is the length of ui, i=1 ... 5) and Var(e) = σ2In.

The ASReml code for this analysis is
 Bloodworm data Dr M Stevens
  pair 132
  rootwt
  run 66
  tmt 2 !A
  id
  variety 44 !A
 rice.asd !skip 1 !DOPATH 1
 !PATH 1
 sqrt(rootwt) ~ mu tmt !r variety variety.tmt run pair run.tmt
 0 0 0
 !PATH 2
 sqrt(rootwt) ~ mu tmt !r variety tmt.variety run pair tmt.run,
 uni(tmt,2)
 0 0 2
 tmt.variety 2
 2 0 DIAG .1 .1 !GU
 44 0 0
 tmt.run 2
 2 0 DIAG .1 .1 !GU
 66 0 0
The two paths in the input file define the two univariate analyses we will conduct. We consider the results from the analysis defined in PATH 1 first. A portion of the output file is
    5 LogL=-345.306     S2=  1.3216        262 df
    6 LogL=-345.267     S2=  1.3155        262 df
    7 LogL=-345.264     S2=  1.3149        262 df
    8 LogL=-345.263     S2=  1.3149        262 df

  Source                Model  terms     Gamma     Component    Comp/SE   % C
  variety                  44     44   1.80947       2.37920       3.01   0 P
  run                      66     66  0.244243      0.321144       0.59   0 P
  variety.tmt              88     88  0.374220      0.492047       1.78   0 P
  pair                    132    132  0.742328      0.976057       2.51   0 P
  run.tmt                 132    132   1.32973       1.74841       3.65   0 P
  Variance                264    262   1.00000       1.31486       4.42   0 P

                                    Wald F statistics
      Source of Variation           NumDF     DenDF    F-inc             Prob
    7 mu                                1      53.5  1484.27            <.001
    4 tmt                               1      60.4   469.36            <.001
The estimated variance components from this analysis are given in column (a) of table 1. The variance component for the variety main effects is large. There is evidence of tmt.variety interactions so we may expect some discrimination between varieties in terms of tolerance to bloodworms.

Table 1 Estimated variance components from univariate analyses of bloodworm data. (a) Model with homogeneous variance for all terms and (b) Model with heterogeneous variance for interactions involving tmt
(a) --- (b) ---
source control treated
variety 2.378 2.334
tmt.variety 0.492 1.505 -0.372
run 0.321 0.319
tmt.run 1.748 1.388 2.223
variety.run (pair) 0.976 0.987
tmt.pair 1.315 1.156 1.359
REML LogL -345.256 -343.22

Given the large difference (p<0.001) between tmt means we may wish to allow for heterogeneity of variance associated with tmt. Thus we fit a separate variety variance for each level of tmt so that instead of assuming Var(u2) = σ22I88 we assume
Var(u2) = | σ22c 0 | cross I44
| 0 σ22t |
where σ22c and σ22t are the tmt.variety interaction variances for control and treated respectively. This model can be achieved using a diagonal variance structure for the treatment part of the interaction. We also fit a separate run variance for each level of tmt and heterogeneity at the residual level, by including the uni(tmt,2) term. We have chosen level 2 of tmt as we expect more variation for the exposed treatment and thus the extra variance component for this term should be positive. Had we mistakenly specified level 1 then ASReml would have estimated a negative component by setting the !GU option for this term. The portion of the ASReml output for this analysis is
    6 LogL=-343.428     S2=  1.1498        262 df    :   1 components constrained
    7 LogL=-343.234     S2=  1.1531        262 df
    8 LogL=-343.228     S2=  1.1572        262 df
    9 LogL=-343.228     S2=  1.1563        262 df

  Source                Model  terms     Gamma     Component    Comp/SE   % C
  variety                  44     44   2.01903       2.33451       3.01   0 P
  run                      66     66  0.276045      0.319178       0.59   0 P
  pair                    132    132  0.853941      0.987372       2.59   0 P
  uni(tmt,2)              264    264  0.176158      0.203684       0.32   0 P
  Variance                264    262   1.00000       1.15625       2.77   0 P
  tmt.variety         DIAGonal     1   1.30142       1.50477       2.26   0 U
  tmt.variety         DIAGonal     2 -0.321901     -0.372199      -0.82   0 U
  tmt.run             DIAGonal     1   1.20098       1.38864       2.18   0 U
  tmt.run             DIAGonal     2   1.92457       2.22530       3.07   0 U

                                    Wald F statistics
      Source of Variation           NumDF     DenDF    F-inc             Prob
    7 mu                                1      56.5  1276.73            <.001
    4 tmt                               1      60.6   448.83            <.001
The estimated variance components from this analysis are given in column (b) of table 1. There is no significant variance heterogeneity at the residual or tmt.run level. This indicates that the square root transformation of the data has successfully stabilised the error variance. There is, however, significant variance heterogeneity for tmt.variety interactions with the variance being much greater for the control group. This reflects the fact that in the absence of bloodworms the potential maximum root area is greater. Note that the tmt.variety interaction variance for the treated group is negative. The negative component is meaningful (and in fact necessary and obtained by use of the !GU option) in this context since it should be considered as part of the variance structure for the combined variety main effects and treatment by variety interactions. That is,

Var(12 cross u1+ u2) = | σ21+ σ22c σ21 | cross I44 Eqn (2)
| σ21 σ21+ σ22t |

Using the estimates from table 1 this structure is estimated as
| 3.84 2.33| cross I44
| 2.33 1.96 |

Thus the variance of the variety effects in the control group (also known as the genetic variance for this group) is 3.84. The genetic variance for the treated group is much lower (1.96). The genetic correlation is 2.33/√(3.84*1.96) = 0.85 which is strong, supporting earlier indications of the dependence between the treated and control root area (Figure 1).

A multivariate approach

In this simple case in which the variance heterogeneity is associated with the two level factor tmt, the analysis is equivalent to a bivariate analysis in which the two traits correspond to the two levels of tmt, namely sqrt(rootwt) for control and treated. The model for each trait is given by

yj= Xτj+ Zvuvj + Zrurj + ej; (j = c,t) Eqn (3)

where yj is a vector of length n=132 containing the sqrtroot values for variate j (j=c for control and j=t for treated), τj corresponds to a constant term and uvj and urj correspond to random variety and run effects. The design matrices are the same for both traits. The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures Var(uvj)=σvj2I44, Var(urj)=σrj2I66 and Var(ej) = σ2jI132. The bivariate model can be written as a direct extension of (3), namely

y = (I2 cross X )τ + (I2 cross Zv)uv+ (I2 cross Zr)ur+ e* Eqn (4)

where y = (yc',yt')', uv= (uvc',uvt' )', ur= (urc',urt' )' and e* = (ec,et )'.

There is an equivalence between the effects in this bivariate model and the univariate model of (1). The variety effects for each trait (uv in the bivariate model) are partitioned in () into variety main effects and tmt.variety interactions so that uv= 12 cross u1+ u2. There is a similar partitioning for the run effects and the errors (see table 2).

Table 2. Equivalence of random effects in bivariate and univariate analyses
bivariate univariate
effects (model 4) (model 1)
trait.variety uv 12 cross u1 + u2
trait.run ur 12 cross u3 + u4
trait.pair ue* 12 cross u5 + e

In addition to the assumptions in the models for individual traits (3) the bivariate analysis involves the assumptions Cov(uvc,uvt') = σvctI44, Cov(urc,urt') = σrctI66 and Cov(ecet) = σctI132.

Thus random effects and errors are correlated between traits. So, for example, the variance matrix for the variety effects for each trait is given by

Var(uv =| σvc2 σvct | cross I44
| σvct σvt2 |

This unstructured form for trait.variety in the bivariate analysis is equivalent to the variety main effect plus heterogeneous tmt.variety interaction variance structure (2) in the univariate analysis. Similarly the unstructured form for trait.run is equivalent to the run main effect plus heterogeneous tmt.run interaction variance structure. The unstructured form for the errors ( trait.pair) in the bivariate analysis is equivalent to the pair plus heterogeneous error ( tmt.pair) variance in the univariate analysis. This bivariate analysis is achieved in ASReml as follows, noting that the tmt factor here is equivalent to traits.
 this is for the paired data
  id
  pair 132
  run 66
  variety 44 !A
  yc ye
 ricem.asd !skip 1 !X syc !Y sye
 sqrt(yc) sqrt(ye) ~ Trait !r Tr.variety Tr.run
 1 2 2
 132 !S2==1
 Tr 0 US 2.21 1.1 2.427
 Tr.variety 2
 2 0 US 1.401 1 1.477
 44 0 0
 Tr.run 2
 2 0 US .79 .5 2.887
 66 0 0
 predict variety
A portion of the output from this analysis is
    7 LogL=-343.220     S2=  1.0000        262 df
    8 LogL=-343.220     S2=  1.0000        262 df

  Source                Model  terms     Gamma     Component    Comp/SE   % C
  Residual            UnStruct     1   2.14373       2.14373       4.44   0 U
  Residual            UnStruct     1  0.987401      0.987401       2.59   0 U
  Residual            UnStruct     2   2.34751       2.34751       4.62   0 U
  Tr.variety          UnStruct     1   3.83959       3.83959       3.47   0 U
  Tr.variety          UnStruct     1   2.33394       2.33394       3.01   0 U
  Tr.variety          UnStruct     2   1.96173       1.96173       2.69   0 U
  Tr.run              UnStruct     1   1.70788       1.70788       2.62   0 U
  Tr.run              UnStruct     1  0.319145      0.319145       0.59   0 U
  Tr.run              UnStruct     2   2.54326       2.54326       3.20   0 U
  Covariance/Variance/Correlation Matrix UnStructured
   2.144     0.4402
  0.9874      2.348
  Covariance/Variance/Correlation Matrix UnStructured
   3.840     0.8504
   2.334      1.962
  Covariance/Variance/Correlation Matrix UnStructured
   1.708     0.1531
  0.3191      2.543
Table 3. Estimated variance parameters from bivariate analysis of bloodworm data
control treated
source variance variance covariance
us(trait).variety 3.84 1.96 2.33
us(trait).run 1.71 2.54 0.32
us(trait).pair 2.14 2.35 0.99
The resultant REML LogL is identical to that of the heterogeneous univariate analysis (column (b) of table 2). The estimated variance parameters are given in Table 3. The predicted variety means in the .pvs file are used in the following section on interpretation of results. A portion of the file is presented below. There is a wide range in SED reflecting the imbalance of the variety concurrence within runs.
           Assuming Power transformation was (Y+   0.000)^ 0.500
  run            is ignored in the prediction (except where specifically included

  Trait      variety       Powervlue  StandEror Ecode Retransformed approxS
  sqrt(yc)   AliCombo        14.9532         0.9181 E       223.5982     27.4571
  sqrt(ye)   AliCombo         7.9941         0.7993 E        63.9054     12.7790
  sqrt(yc)   Bluebelle       13.1033         0.9310 E       171.6969     24.3980
  sqrt(ye)   Bluebelle        6.6299         0.8062 E        43.9559     10.6901
  sqrt(yc)   C22             16.6679         0.9181 E       277.8192     30.6057
  sqrt(ye)   C22              8.9543         0.7993 E        80.1798     14.3140
   .              .               .             .   .            .           .
  sqrt(yc)   YRK1            15.1859         0.9549 E       230.6103     29.0012
  sqrt(ye)   YRK1             8.3356         0.8190 E        69.4817     13.6534
  sqrt(yc)   YRK3            13.3057         0.9549 E       177.0428     25.4106
  sqrt(ye)   YRK3             8.1133         0.8190 E        65.8264     13.2894
  SED: Overall Standard Error of Difference   1.215

Figure 2. BLUPs for treated for each variety plotted against BLUPs for control

Interpretation of results

Recall that the researcher is interested in varietal tolerance to bloodworms. This could be defined in various ways. One option is to consider the regression implicit in the variance structure for the trait by variety effects. The variance structure can arise from a regression of treated variety effects on control effects, namely uvt = β uvc + ε where the slope β = σvct / σvc2. Tolerance can be defined in terms of the deviations from regression, ε. Varieties with large positive deviations have greatest tolerance to bloodworms. Note that this is similar to the researcher's original intentions except that the regression has been conducted at the genotypic rather than the phenotypic level. In Figure 2 the BLUPs for treated have been plotted against the BLUPs for control for each variety and the fitted regression line (slope = 0.61) has been drawn. Varieties with large positive deviations from the regression line include YRK3, Calrose, HR19 and WC1403.


Figure 3. Estimated deviations from regression of treated on control for each variety plotted against estimate for control

An alternative definition of tolerance is the simple difference between treated and control BLUPs for each variety, namely δ = uvc - uvt. Unless β=1 the two measures ε and δ have very different interpretations. The key difference is that ε is a measure which is independent of inherent vigour whereas δ is not. To see this consider Cov(ε){uvc'} = Cov(uvtuvc){uvc'} = ( σvct - [σvctvc2] σvc2 ) I44 = 0 whereas Cov(δ){uvc'} = Cov(uvc-uvt){uvc'} = (σvc2 - σvct ) I44


Figure 4. Estimated difference between control and treated for each variety plotted against estimate for control

The independence of ε and uvc and dependence between δ and uvc is clearly illustrated in Figures 3 and 4 In this example the two measures have provided very different rankings of the varieties. The choice of tolerance measure depends on the aim of the experiment. In this experiment the aim was to identify tolerance which is independent of inherent vigour so the deviations from regression measure is preferred.
  • Back

    Return to start