Variability in unbalanced data - Volts

In this example we illustrate an analysis of unbalanced data in which the main aim is to determine the sources of variation rather than assess the significance of imposed treatments. The data are taken from Cox and Snell (1981) and involve an experiment to examine the variability in the production of car voltage regulators. Standard production of regulators involves two steps. Regulators are taken from the production line to a setting station and adjusted to operate within a specified voltage range. From the setting station the regulator is then passed to a testing station where it is tested and returned if outside the required range.

The voltage of 64 regulators was set at 10 setting stations ( setstat); between 4 and 8 regulators were set at each station. The regulators were each tested at four testing stations ( teststat). The ASReml input file is presented below.
 Voltage data
  teststat 4   # 4 testing stations tested each regulator
  setstat  !A  # 10 setting stations each set 4-8 regulators
  regulatr 8   # regulators numbered within setting stations
  voltage
 voltage.asd !skip 1
 voltage ~ mu !r setstat setstat.regulatr teststat setstat.teststat
 0 0 0
The factor regulatr numbers the regulators within each setting station. Thus the term setstat.regulatr allows for differential effects of each regulator, while the other terms examine the effects of the setting and testing stations and possible interaction. The abbreviated output is given below
  LogL= 188.604     S2= 0.67074E-01    255 df
  LogL= 199.530     S2= 0.59303E-01    255 df
  LogL= 203.007     S2= 0.52814E-01    255 df
  LogL= 203.240     S2= 0.51278E-01    255 df
  LogL= 203.242     S2= 0.51141E-01    255 df
  LogL= 203.242     S2= 0.51140E-01    255 df

  Source           Model  terms     Gamma     Component    Comp/SE   % C
  setstat             10     10  0.233418      0.119371E-01   1.35   0 P
  setstat.regulatr    80     64  0.601817      0.307771E-01   3.64   0 P
  teststat             4      4  0.642752E-01  0.328706E-02   0.98   0 P
  setstat.teststat    40     40  0.100000E-08  0.511404E-10   0.00   0 B
  Variance           256    255   1.00000      0.511404E-01   9.72   0 P
  Warning: Code B - fixed at a boundary (!GP)       F - fixed by user
                ? - liable to change from P to B    P - positive definite
                C - Constrained by user (!VCC)      U - unbounded
                S - Singular Information matrix
The convergence criteria has been satisfied after six iterations. A warning message in printed below the summary of the variance components because the variance component for the setstat.teststat term has been fixed near the boundary. The default constraint for variance components ( !GP) is to ensure that the REML estimate remains positive. Under this constraint, if an update for any variance component results in a negative value then ASReml sets that variance component to a small positive value. If this occurs in subsequent iterations the parameter is fixed to a small positive value and the code B replaces P in the C column of the summary table. The default constraint can be overridden using the !GU qualifier, but it is not generally recommended for standard analyses.

The Figure presents the residual plot which indicates two unusual data values. These values are successive observations, namely observation 210 and 211, being testing stations 2 and 3 for setting station 9( J), regulator 2. These observations will not be dropped from the following analyses for consistency with other analyses conducted by Cox and Snell (1981) and in the GenStat manual.


Figure 1. Residual plot for the voltage data

The REML Loglikelihood from the model without the setstat.teststat term was 203.242, the same as the REML Loglikelihood for the previous model. Table 1. presents a summary of the REML Loglikelihood for the remaining terms in the model. The summary of the ASReml output for the current model is given below. The column labelled Comp/SE is printed by ASReml to give a guide as to the significance of the variance component for each term in the model. The statistic is simply the REML estimate of the variance component divided by the square root of the diagonal element (for each component) of the inverse of the average information matrix. The diagonal elements of the expected (not the average) information matrix are the asymptotic variances of the REML estimates of the variance parameters. These Comp/SE statistics cannot be used to test the null hypothesis that the variance component is zero. If we had used this crude measure then the conclusions would have been inconsistent with the conclusions obtained from the REML Loglikelihood (see Table 1).
  Source          Model  terms     Gamma     Component    Comp/SE   % C
  setstat            10     10  0.233417      0.119370E-01   1.35   0 P
  setstat.regulatr   80     64  0.601817      0.307771E-01   3.64   0 P
  teststat            4      4  0.642752E-01  0.328705E-02   0.98   0 P
  Variance          256    255   1.00000      0.511402E-01   9.72   0 P
Table 1. REML LogL for the variance components in the voltage data
REML -twice
terms log-likelihood difference P-value
- setstat 200.31 5.864 .0077
- setstat.regulatr 184.15 38.19 .0000
- teststat 199.71 7.064 .0039
  • Back

    Return to start