AnGenMap: 7511

AnGenMap Archived Post

From listmasteranimalgenome.org  Thu Feb 22 09:13:26 2018
From: Romdhane Rekaya <rrekayauga.edu> 
Subject: RE: Deep learning in animal breeding? 
To: Members of AnGenMap <angenmapanimalgenome.org>
Date: Thu, 22 Feb 2018 09:13:26 -0600

Hi all:

The power of deep learning (DL) in solving very complex problems is
unquestionable and some niche applications in our field are likely.
However, its potential benefit could be limited in genomic selection for
several reasons: 1) the majority of genetic variation is additive for a lot
of traits so that non-linear relationships between genotypes and phenotypes
are less prevalent that other applications; 2) the convolution step (a kind
of feature selection step) is very useful in presence of complex hidden
structures as indicated by Patrik which is not likely to be the case with
genotype data; 3) although large, genotype data lack the heterogeneity
(variety of data types) that characterized big data.

>From implementation point of views and on top of the well-known overfitting
issues, conventional or DL neural networks analyses are largely based on
good heuristics regarding the network architecture and associated hyper-
parameters which are data dependent. With the existing high density or
sequence data, testing empirically for these parameters could be very
challenging computationally even with the use of GPU servers and parallel
computing.

>From breeding perspective, genetic evaluations are carried out jointly for
multiple traits where the fast majority of individuals are not genotypes. I
am not aware of any NN configuration that can deal with these two issues.
If that is true, potential benefit from DL compared to classical GS could
be reduced or even eliminated due to single trait analyses.

The paper mentioned by Bruce seems to be a good first study in the use of
DL in GS. However, the comparisons carried out in the study are a bit
unfortunate. In the comparison with FNN, Patrik already discuss the
limitations in his email. In comparison with classical GS implementations,
RR-BLUP is not the reference method. It is very clear from the paper that
the superiority of DL approach is due to feature selection (superiority of
DL was reduced when the number of SNPs were decreased as indicated in
figure 5 and superiority was magnified when alpha was reduced as presented
in figure 3). Thus, fair comparisons should have included methods that
prioritize SNPs based on statistical criteria (BayesB, BayesC and
BayesCPI), on prior information (BayesR and BayesRC) or population genetics
parameters (for example Fst scores; Chang et al. (2018)). A comparison
against a multivariate implementation using ssGBLUP could be highly
informative.

Finally, on the use of ensemble predictors or aggregation of experts, Dan
has discussed their potential limitations in one of his emails.


Romdhane Rekaya
Professor
Dept. of Animal and Dairy Science
Dept. of Statistics
Institute of Bioinformatics
106 Animal and Dairy Science Complex
University of Georgia, Athens, GA 30602, USA
Phone: 1-706-542 0949
Fax: 1-706-5830274

On Tue, Feb 20, 2018 at 12:16 PM, DANIEL GIANOLA <gianolaansci.wisc.edu>
wrote:

> We recently completed another successful World Congress of Genetics
> Applied to Livestock Production, this time (February 2018) in beautiful
> Auckland, New Zealand.
>
>     The conference started with a plenary lecture on deep learning in
> animal breeding by Jeremy Howard (Enlitic Corporation), a successful
> start-up entrepreneur and scientist who, according to the program
> description, is a regular guest at Australian highest-rated breakfast news
> programs.
>
>     Howard gave a spectacular account of the successes of deep learning in
> several domains. Of course, he could not give examples in animal breeding
> because there are none, at least published in a peer-reviewed journal (as
> far as I know). When reacting to a question I posed, the speaker reacted by
> stating that "animal breeding methods are obsolete". He may well be right,
> but where is the evidence for the statement?
>
>     It is highly unlikely that you, colleagues, are not familiar with deep
> learning, as exciting news appear every day in the popular press (as well
> as in breakfast news). Successes range from on how to use deep learning to
> buy a $25 dollar iron, to its use by Winston, the central character (I mean
> a computer, not Churchill) in Dan Brown's recent book "Origin".
>
>     Now, our obsolete breeding (animal and plant) methods are best
> represented by BLUP and, even when confronted with powerful Bayesian
> machines, neural networks, support vector ensembles, random forests, and
> kernel methods, BLUP does well, most often, an is simple to implement.
> Actually, the evidence seems to suggest that kernel methods (BLUP is a
> special case) are possibly the most stable and flexible prediction
> machines. Most of the evidence supporting the assertion comes from
> extensive studies by Crossa and co-workers from the CIMMYT (International
> Center for Maize and Wheat Improvement) in multiple trials all over the
> world.
>
> In the same WCGALP2018, Francesco Tiezzi (North Carolina State University)
> gave empirical evidence that reproducing kernel Hilbert spaces regression
> accommodates genotype x environment interactions in a formidably effective
> and elegant manner. By the way, we have been saying so for many years and
> Jarquin et al. (2014, TAG) gave the first proof-of-concept in plants.
>
>     Deep learning methods are, essentially, extensions of neural networks
> with many layers( notably convolutional networks") but with a special
> configuration, such that neurons handle information in three-dimensions
> (please Google for better descriptions). Jeremy Howard pointed out that
> deep learners are "universal approximators", a well-known property of
> neural networks that can be explained using a famous theorem by Kolmogorov
> on multidimensional function approximation. By the way, BLUP (being a
> linear neural network with an identity activation function) is a linear
> universal approximator as well.
>
>     Even with vanilla regularized neural networks, a severe problem is
> that of overfitting, a most serious problem that WILL occur when a network
> is fed, say, millions of sequence variants as predictors and sample size n
> is (at best) in the thousands. I suspect that deep learners will probably
> collapse as well, but I cannot say it with certainty because the number of
> studies in animal breeding is...exactly zero (0)
>
>     Since scientists are trained to be skeptical, I tried to convey this
> warning by pointing out that animal (and plant) breeders are not refractory
> to new ideas, but that we have had some transit in machine learning (I
> think Howard was unaware of). As I was trying to cite the only two studies
> (plant breeding) that have evaluated some deep learners so far, the
> Chairman interrupted me (without justification, I believe, as a scientific
> congress is nurtured by strong and honest discussion).
>
>     In case that you are interested, the first study is an MS Thesis by
> McDowell ("Genomic selection with deep neural networks", Department of
> Agronomy, Iowa State University, 2016). There was no evidence of the
> superiority of the deep belief networks relative to other methods.
>
>     The second study, published in the Journal of Computational Physics
> (Rachmatia et al. 2017) is from an Indonesian group using CIMMYT maize
> data. They compared the deep learners with BLUP, the Bayesian Lasso and
> RKHS for 8 maize traits. In at least 5 of the 8 comparisons, the deep
> belief networks were a total disaster.
>
>     A most recent study from a Conference in Bioinformatics and
> Biomedicine (IEEE, 2017) Liu and Wang claimed success in two studies with
> soybeans (#markersC13, nQ39) and Pinus (#markersH53, n†1) and
> claimed that the deep learners extracted "meaningful" (to whom?)
> information on SNP effects from such learners. Clearly, this study cannot
> be taken as a canonical representation of the problems confronted by animal
> breeders where, in dairy breeding, for example, the number of variants
> today is at least 1 million.
>
>     If you have reached this message that far, this is what I tried to say
> but could not in WCGALP2018. How is it possible to state that methods are
> "obsolete" when there is no evidence to that effect? Further, the only
> evidence comes from three plant breeding studies only: it is not
> encouraging and tentatively provides support to a view that is contrarian
> to the tone of the talk by by Jeremy Howard.
>
>     I hope that quantitative geneticists will study the problem and
> establish whether or not deep learning will take us to a more exalted state
> of knowledge. In the meantime, support seems in favor of the null
> hypothesis, at least in our modest agricultural fields of endeavor.
>
>  If one of you succeed beyond a reasonable doubt you may end up in
> "Morning Joe"! It would be exciting and, perhaps, will make you a
> millionaire.
>
>     Regards.
>
> *Daniel Gianola*
> *Sewall Wright Professor Emeritus of Animal Breeding and Genetics*
> *Department of Animal Sciences*
> *Department of Biostatistics and Medical Informatics*
> *Department of Dairy Science*
> *University of Wisconsin-Madison*
> 440 Animal Sciences Building
> 1675 Observatory Dr.
> Madison, WI 53706
> USA
> Tel: +1-608-265-2054 <(608)%20265-2054>
> Fax: +1-608-262-5157 <(608)%20262-5157>
> http://www.ansci.wisc.edu/...typages/gianola.html
> htts://http://www.biostat.wisc.edu/...ntent/gianola-daniel
> http://qbi.wisc.edu/gianola-daniel.htm
>
> *"Nature is written in mathematical language (Galileo). If nothing
> else works, try Bayes theorem"*
>
> *"Bayesian methods with subjective priors may butcher the processes of
> science" (adapted from Oscar Kempthorne)*