The datafile
Introduction
The first step in an ASReml analysis is to prepare the data file.
The standard format of an ASReml data file is to have the data
arranged in columns/fields with a single line for each sampling
unit. The columns contain variates and
covariates (numeric), factors
(alphanumeric), traits (response variables) and
weight variables in any order that is convenient to
the user. The data file may be free format, fixed format or a
binary file.
Free format data files
The data are read free format ( SPACE, COMMA or TAB separated) unless
the file name has extension
.bin
for real
binary, or
.dbl
for double precision binary (see below).
Important points to note are as follows:
blank lines are ignored,
column headings, field labels or
comments may be present at the top of the file provided that the
!skip
qualifier
is used to skip over them,
NA, *
and
.
are treated
as coding for missing values in free
format data files;
if missing values are coded with a
unique data value (for example, 0 or -9), use
!M
to flag them as
missing
or
!D
to
drop
the data record containing them
Comma separated values
You may use
Excel
to prepare your data file as a comma delimited file
comma delimited files whose file name ends in
.csv
or for which the
!CSV
qualifier
is set recognise empty fields as missing values,
a line beginning with a comma implies a preceding missing value,
consecutive commas imply a missing value,
a line ending with a comma implies a trailing missing value,
if the filename does not end in
.csv
or the
!CSV
qualifier is not set, commas are treated as white space,
General comments
characters
#
on a line
are ignored so this character may not be used in alphanumeric fields,
blank spaces, tabs and commas must not
be used (embedded) in alphanumeric fields unless the label is
enclosed in quotes, for example, the name
Willow Creek
would
need to be appear in the data file as
`Willow Creek'
to
avoid error,
the
$
symbol must not be used in the data file,
alphanumeric fields have a default size of
16 characters. Use the
!LL
qualifier to extend the size of factor labels stored.
extra data fields on a line are ignored,
if there are fewer data items on a line
than ASReml expects the remainder are taken from the following
line(s) except in
.csv
files were they are taken as missing.
If you end up with half the number of records you expected,
this is probably the reason,
all lines beginning with
!
followed by a blank are copied to the
.asr
file as comments
for the output; their contents are ignored,
a data file line may not exceed 2000 characters; if the data \new
fields will not fit in 2000 characters, put some on the next line.
Fixed format files
The format must be supplied with the
!FORMAT
qualifier.
However, if all fields are
present and are separated, the file can be read free format.
Multiple data files
Sometimes data is split over several files.
In the case where the separate files relate to say separate experiments
in a series of similar experiments and a combined analysis is required,
the data files can be combined through
!INCLUDE
statements.
Binary format data files
Conventions for binary files are as follows:
binary files are read as unformatted
Fortran binary in single precision if the filename has a
.bin
or
.BIN
extension,
Fortran binary data files are read in
double precision if the filename has a
.dbl
or
.DBL
extension,
ASReml recognises the value
-1e37
as a missing value in binary files,
Fortran binary in the above means all real (
.bin
) or all double precision (
.dbl
) variables; mixed types, that is, integer and alphabetic binary representation of variables is not allowed in binary files,
binary files can only be used in
conjunction with a pedigree file if the pedigree fields are coded in the binary file so that they correspond with the
pedigree file (this can be done using the
!SAVE
qualifier
to form the binary file), or the identifiers
are whole numbers less than 9,999,999 and the
!RECODE
qualifier
is specified.
Example
This data file has three fields and a header line
identifying them. The heading line is not used by
ASReml and must be skipped when the file is read.
Source SeedZn LeafZn
1 61 24.1
1 63 23.8
2 51 16.0
2 64 19.0
6 69 22.6
6 75 27.9
6 93 24.6
5 85 31.3
5 86 35.4
5 80 20.9
7 47 13.9
7 49 14.0
7 57 17.3
8 50 10.8
8 48 12.3
8 46 13.9
11 69 26.8
11 79 31.7
12 64 22.5
12 68 24.2
13 48 13.4
13 66 15.1
13 53 14.1
14 39 11.7
14 40 11.5
14 45 12.3
17 63 24.8
17 64 25.0
17 70 21.4
18 63 28.2
18 61 23.0
19 36 11.0
19 29 10.2
19 29 10.9
21 57 18.6
21 68 21.2
21 61 18.2
24 84 25.2
24 64 25.1
Return to start