Recent advances in faba bean genetic and genomic tools for crop improvement

Abstract Faba bean ( Vicia faba L.), a member of the Fabaceae family, is one of the important food legumes cultivated in cool temperate regions. It holds great importance for human consumption and livestock feed because of its high protein content, dietary fibre, and nutritional value. Major faba bean breeding challenges include its mixed breeding system, unknown wild progenitor, and genome size of ~13 Gb, which is the largest among diploid field crops. The key breeding objectives in faba bean include improved resistance to biotic and abiotic stress and enhanced seed quality traits. Regarding quality traits, major progress on reduction of vicine‐convicine and seed coat tannins, the main anti‐nutritional factors limiting faba bean seed usage, have been recently achieved through gene discovery. Genomic resources are relatively less advanced compared with other grain legume species, but significant improvements are underway due to a recent increase in research activities. A number of bi‐parental populations have been constructed and mapped for targeted traits in the last decade. Faba bean now benefits from saturated synteny‐based genetic maps, along with next‐generation sequencing and high‐throughput genotyping technologies that are paving the way for marker‐assisted selection. Developing a reference genome, and ultimately a pan‐genome, will provide a foundational resource for molecular breeding. In this review, we cover the recent development and deployment of genomic tools for faba bean breeding.

cies with six pairs of remarkably large chromosomes. Its genome is one of the largest of any diploid field crop, about 13 Gbp in the haploid complement (Soltis et al., 2003) and contains more than 85% repetitive DNA (Novák et al., 2020). The large genome of faba bean is 2.9, 3.0 and 15.9 times larger than pea, lentil and chickpea, respectively. Assembly of the faba bean genome and map-based cloning was delayed both due to its genome complexity (e.g. abundance of transposable elements) and the lower investment in its study compared with, for example, soybean. In the absence of a reference genome assembly for this species, high-throughput approaches such as transcriptome analysis have been efficient tools for enrichment of genomic resources (e.g. Arun-Chinnappa & McCurdy, 2015;Braich et al., 2017;Khan et al., 2019;Ocaña et al., 2015;Ray et al., 2015). However, from these reported transcriptome datasets, only limited DNA sequence data are available in public databases (Mokhtar et al., 2020). Additionally, the development of high-density genetic maps derived from multiple populations and gene-based molecular markers, particularly those developed by Webb et al. (2016) and Carrillo-Perdomo et al. (2020), has paved the road to marker-assisted selection (MAS) and gene discovery. For example, the elucidation of the biosynthetic pathway for the pyrimidine glycosides vicine and convicine (v-c) (Björnsdotter et al., 2020), which have been the main factors limiting faba bean cultivation and usage in many warm regions, was not possible without the combination of transcriptome data (Ray et al., 2015) and gene-based comparative mapping approaches (Khazaei et al., 2015(Khazaei et al., , 2017. Two recent review papers on this topic cover the coming of age of faba bean genetics and genomics in some detail (see Maalouf et al., 2019;O'Sullivan & Angra, 2016), but, since then, major progress on the key seed anti-nutrients v-c (Björnsdotter et al., 2020), seed coat tannins (e.g. Gutiérrez et al., 2020;Gutiérrez & Torres, 2019), as well as improved mapping approaches (Carrillo-Perdomo et al., 2020) and transcriptome data (see Section 3), has been made (e.g. Gao et al., 2020;Wu et al., 2020;Yang et al., 2020). We provide here a comprehensive review on the mapping population and genomic resources in this species.

| Genetic maps
Genetic linkage maps have been developed in faba bean using different types of populations and molecular markers (Table 1). Sirks (1931) was the first to report a faba bean genetic map, identifying 19 genetic factors that formed four linkage groups. His genetic resources were lost during World War II. Four decades later, Sjödin (1971) used translocation lines for the assignment of different loci (for morphological observations, flower and seed coat colour) to their respective chromosomes. Genetic mapping studies were developed in the 1990s first with the aid of morphological markers, isozymes, seed protein genes and random amplified polymorphic DNA (RAPD) markers. Later, the development of expressed sequence tags (ESTs), microsatellites or single sequence repeats (SSRs), EST-SSRs and single nucleotide polymorphism (SNP) markers helped to enrich faba bean genetic studies and breeding. The first DNA-based linkage map in faba bean was constructed with only 17 markers, of which 10 were RFLPs (restriction fragment length polymorphism) (van de Ven et al., 1991). The first set of SSR markers were developed by Požárková et al. (2002) and then mapped by Román et al. (2004) Gaertn., was developed by Ellwood et al. (2008); synteny and genic collinearity among the legumes make the data applicable to V. faba and other legumes (Lee et al., 2017). Kaur, Kimber, et al. (2014) reported the first exclusively SNP-based generic map of faba bean. Satovic et al. (2013) reported the first reference consensus genetic map, which covered 4062 cM (centiMorgan) in six main linkage groups, corresponding to the six chromosomes of faba bean. Table 1 shows that with the development of faba bean sequences and marker datasets, there was a correspondingly encouraging increase in the density and utility of gene-based genetic maps. In the last few years, the significant advancements in genotyping and sequencing technologies have led to two new SNP-based highly dense consensus maps.
An international effort resulted in the first consensus map for six mapping populations, based on SNP markers derived from M. truncatula EST-SSRs, mtSSRs (mitochondrial-simple sequence repeats) and microRNA-target markers in faba bean has been launched (Mokhtar et al., 2020). Now that most pulse genomes are available, it is important to implement comparative genomic approaches, which will ultimately assist in the identification of candidate genes, quantitative trait loci (QTL) mapping, and assembly of the genome in faba bean.

| Mapping populations
Published studies in faba bean to date have mostly involved biparental populations, derived from crosses between two inbred lines.
Several types of bi-parental mapping populations, such as F 2 , backcrosses and recombinant inbred lines (RILs), have been employed for genetic map construction and trait mapping. The relatively large set of interconnected bi-parental populations that segregate for diverse important traits in this species will help advance faba bean breeding (Table 1). These types of populations are easy to construct and represent a powerful tool for QTL detection. Their optimal allele frequency and low rate of linkage disequilibrium decay within chromosomes means that only a few hundred RILs/markers are needed to map a QTL (Scott et al., 2020). Despite the advantages of bi-parental populations, their mapping precision is low due to the low total amount of genetic recombination, as only two alleles are present at any locus, and to the low amount of genetic diversity that can be created by only two founders. These factors may limit the number of QTLs captured. Multi-parent populations have been developed to cope with the limitations of bi-parental populations (Scott et al., 2020). In faba bean, a multi-parent population derived from 11 European winter bean founders was created and employed to identify genomic regions controlling frost adaptation (Sallam & Martsch, 2015  Despite the wealth of faba bean germplasm, characterization and preliminary evaluation remain a challenge. Faba bean is represented in the collections by only the cultivated forms, and a wide range of variation in plant and seed phenotypic characteristics have been reported (Khazaei, 2014;Maalouf et al., 2019). The development of a reference genome, gene functional analyses and genotype-phenotype association, together with the development of high-throughput genotyping platforms, will facilitate characterization of the genetic diversity within the germplasm collections as well as understanding of its potential. It will aid exploitation of the diversity as a key resource for breeding.  (Atienza et al., 2016). In addition, some attention has been given to rust resistance (Uromyces viciae-fabae (Pers.) J. Shört.) Ijaz, 2018 Khazaei, Link, et al., 2018). Two mapping populations (Mélodie/2 × ILB 938/2 and Disco/2 × ILB 938/2) have been phenotyped at the University of Saskatchewan and QTL mapping is underway. In addition, a list of faba bean accessions with resistance to chocolate spot is available (Maalouf et al., 2016).

| Trait mapping
Some progress has been made in identifying QTLs for abiotic stresses such as frost tolerance (Arbaoui et al., 2008;Sallam et al., 2016;Sallam & Martsch, 2015), traits related to drought adaptation (Ali et al., 2016;Khazaei et al., 2014a), and yield ( Avila et al., 2017;Cruz-Izquierdo et al., 2012). The first two mapping studies on v-c content (Gutiérrez et al., 2006;Ramsay et al., 1995) revealed that it was controlled by one major locus. Khazaei et al. (2015) showed that the distribution of v-c concentration was bimodal, which was consistent with the detection of a single major QTL at the previously reported vc − locus on faba bean chromosome 1. Later, a robust, breeder-friendly and high-throughput KASP marker was developed and validated from this region (Khazaei et al., 2017). This marker was found to reside within the bifunctional riboflavin biosynthesis protein RIBA1, the gene for which is now termed VC1, that underlies the major v-c QTL and catalyses a key step in v-c biosynthesis (Björnsdotter et al., 2020). The VC1 gene identification, which relied on genetic mapping and geneto-metabolite correlations, now paves the way for development of faba bean cultivars free from v-c based on new insight into the v-c biosynthetic pathway.
Seed coat tannins limit faba bean use in food and feed; a low tannin phenotype, characterised by white flower colour, is controlled by two unlinked recessive genes, zt1 and zt2. A comparative mapping approach identified an ortholog of the M. truncatula WD40 transcription factor TTG1 (Transparent Testa Glabra 1), located on chromosome 2, as the zt1 gene (Webb et al., 2016). These results have been recently confirmed by Gutiérrez and Torres (2019), who characterized zt1 and proved the high similarity of the gene sequence with other legume spe- cies. An allele-specific diagnostic marker was also developed that differentiates zt1 from other genotypes. Gutiérrez et al. (2020) reported the bHLH transcription factor VfTT8 (Transparent Testa8) located on chromosome 3 as the zt2 gene. A robust KASP marker for the zt2 gene is now available (Zanotto et al., 2020).
Besides the successful gene discovery for quality traits mentioned above, progress was also made for gene discovery and development of a diagnostic molecular marker for the terminal inflorescence gene (ti) in faba bean (Avila et al., , 2007. The TFL1 (Terminal Flower 1), as main regulator of inflorescence development in legumes (Benlloch et al., 2015), was responsible for the determinate growth habit in faba bean (Vf_TFL1) and is located on chromosome 5.

| TRANSCRIPTOMES
A number of transcriptomes have been reported for faba bean (Table 3) Table 3). Since then, the transcriptome data coverage has been further enriched Cooper et al., 2017). A high proportion of transcripts (about 96%) from Webb et al. (2016) was captured by transcriptome data of Braich et al. (2017). The sequence length data were increased at 461 chromosomal loci and provided increased accuracy by Cooper et al. (2017) compared with transcriptome data in Webb et al. (2016). The transcriptome data of Braich et al. (2017) revealed that faba bean, despite its large complex genome, compared similarly with other legume species in expressed gene content.
Next-generation sequencing (NGS) platforms, especially highthroughput RNA sequencing (RNA-seq) technology, one of the most powerful tools currently available for transcriptome profiling, has enhanced the efficiency and speed of gene discovery in faba bean (Table 3). For example, the identification and characterization of differential gene expression from tissues subjected to drought (Alghamdi et al., 2018;Wu et al., 2020), vernalization (Gao et al., 2020), and salinity stress  have benefited greatly. These findings will help in understanding the stress tolerance mechanisms in the crop and will provide resources for functional genomics. Coupled with allelic data and trait mapping, the data will be invaluable in the development of more resilient faba bean varieties. A high-quality reference transcriptome has been completed (Björnsdotter et al., 2020) and is being expanded to a pan-transcriptome using data from four different genotypes (Hedin, Hiverna, 153b and 2378), including data from both shoot and root tissues . This effort has provided a comprehensive faba bean reference gene set that will be a valuable new resource for differential gene expression analyses and genome annotation.

| CONCLUSIONS AND PERSPECTIVES
Uncovering genes associated with the biosynthesis of vicineconvicine Young and mature leaf, flower, pod and whole seed at early seedfilling stage, embryo and pod at mid maturation, and stem 49,277 transcripts Illumina HiSeq PE150 models for the reference assembly. The transcriptome work has also led to production of a high density faba bean genotyping array, which is now available from the University of Reading, UK. The array (known as 'Vfaba_v2'), built on Life Technologies Axiom platform, contains 24,929 polymorphic high resolution SNP markers located in 15,846 different genes. Faba bean now benefits from saturated syntenybased genetic maps, NGS, and high-throughput genotyping technologies, which together will greatly aid genome assembly. Release of the reference genome will further advance the faba bean genomics and breeding revolution.