GP_flow

Figure 1. A model for how genetic data can be used to generate Genomic Prediction models useful for crop breeders and to basic scientists studying the genetic basis of trait variation.

The ability to predict complex traits from genotypes is a grand challenge in biology and is changing crop and livestock breeding. Genomic Prediction (GP, aka Genomic Selection) using whole-genome DNA sequence was originally proposed by Meuwissen et al. in 2001 as a solution to the limitations of Marker-Assisted Selection (MAS). GP is particularly well-suited for the prediction of quantitative traits, such as yield or drought resistance, that are controlled by many small-effect alleles. Since 2001, there has been an explosion in GP algorithms available, that differ in how they account for non-linear interactions, population structure, differences in the effect size at each marker, and epistatic interactions between markers.

Most recently, deep learning approaches have been used to predict traits from genomic data. Briefly, deep learning refers to machine learning approaches that perform layers of transformations on features to create abstraction features, known as hidden layers, which are used for the ultimate predictions. An ongoing effort is to benchmark such deep learning models against more tradiational GP algorithms across a broad range of species and traits.