Qualifiers affecting data input

Datafile line

The purpose of the datafile line is to

nominate the data file,

specify qualifiers to modify

the reading of the data,
the output produced,
the operation of ASReml.

The datafile line appears in the ASReml command file in the form
datafile [ qualifiers ]

datafile is the path name of the file that contains the variates, factors, covariates, traits (response variates) and weight variables represented as data fields; enclose the path name in quotes if it contains embedded blanks,

the qualifiers tell ASReml to modify either

the reading of the data and/or
the output produced, see below,
the operation of ASReml, see Common job qualifiers

the data file related qualifiers must appear on the data file

line,

the job control qualifiers may appear on the data file

line or on following lines,

the arguments to qualifiers are represented by the following symbols

f --- a filename,
n --- an integer number, typically a count,
p --- a vector of real numbers, typically in increasing order,
r --- a real number,
s --- a character string,
t --- a model term label,
v --- the number or label of a data variable,
vlist --- a list of variable labels.

Data input qualifiers

Frequently used data file qualifiers

!SKIP n causes the first n records of the (non-binary) data file to be ignored. Typically these lines contain column headings for the data fields.

Other data file qualifiers.

!CSV used to make consecutive commas imply a missing value; this is automatically set if the file name ends with .csv or .CSV

!DATAFILE s specifies a data file name replacing the one obtained from the datafile line. It is required when different !PARTS of a job must read different files. The !SKIP qualifier, if specified, will be applied when reading the file.

!FILTER v [ !SELECT n ] enables a subset of the data to be analysed; v is the number or name of a data field. When reading data, the value in field v is checked after any transformations are performed. If !select is omitted, records with zero in field v are omitted from the analysis. Otherwise, records with n in field v are retained and all other records are omitted.
Warning If the filter column contains a missing value, the value from the previous non-missing record is assumed in that position.

!FORMAT s supplies a Fortran like FORMAT statement for reading fixed format files.

!MERGE c f [ !SKIP n !MATCH a b ]
may be specified on a line following the datafile line. The purpose is to combine data fields from the (primary) data file with data fields from a secondary file (f).

!READ n formally instructs ASReml to read n data fields from the data file. It is needed when there are extra columns in the data file that must be read but are only required for combination into earlier fields in transformations, or when ASReml attempts to read more fields than it needs to.

!RECODE is required when reading a binary data file with pedigree identifiers that have not been recoded according to the pedigree file. It is not needed when the file was formed using the !SAVE qualifier but will be needed if formed in some other way.

!RREC [ n] causes ASReml to read n records or to read up to a data reading error if n is omitted, and then process the records it has. This allows data to be extracted from a file which contains trailing non-data records (for example extracting the predicted values from a .pvs file). The argument (n) specifies the number of data records to be read. If not supplied, ASReml reads until a data reading error occurs, and then processes the data it has. Without this qualifier, ASReml aborts the job when it encounters a data error. See !RSKIP

!RSKIP n [s] allows ASReml to skip lines at the heading of a file down to (and including) the nth instance of string s. For example, to read back the third set predicted values in a .pvs file, you would specify
!RREC !RSKIP 4 ' Ecode'
since the line containing the 4th instance of Ecode immediately precedes the predicted values. Used with the !RREC qualifier, ASReml will read until the end of the predict table. The keyword Ecode which occurs once at the beginning and then immediately before each block of data in the .pvs file is used to count the sections.

Return to start