Pan‐European phylogeography of the European roe deer (Capreolus capreolus)

Abstract To provide the most comprehensive picture of species phylogeny and phylogeography of European roe deer (Capreolus capreolus), we analyzed mtDNA control region (610 bp) of 1469 samples of roe deer from Central and Eastern Europe and included into the analyses additional 1541 mtDNA sequences from GenBank from other regions of the continent. We detected two mtDNA lineages of the species: European and Siberian (an introgression of C. pygargus mtDNA into C. capreolus). The Siberian lineage was most frequent in the eastern part of the continent and declined toward Central Europe. The European lineage contained three clades (Central, Eastern, and Western) composed of several haplogroups, many of which were separated in space. The Western clade appeared to have a discontinuous range from Portugal to Russia. Most of the haplogroups in the Central and the Eastern clades were under expansion during the Weichselian glacial period before the Last Glacial Maximum (LGM), while the expansion time of the Western clade overlapped with the Eemian interglacial. The high genetic diversity of extant roe deer is the result of their survival during the LGM probably in a large, contiguous range spanning from the Iberian Peninsula to the Caucasus Mts and in two northern refugia.


| INTRODUC TI ON
Climate fluctuations during the Last Glacial Period between 115,000 and 11,500 years ago (Lokrantz & Sohlenius, 2006) played an important role in shaping the current species composition, distribution, and genetic diversity of mammals in Europe. During glaciations, the ranges of cold sensitive species were limited to refugial areas located in the Balkan, the Iberian and the Apennine Peninsulas (Hewitt, 2004;Taberlet et al., 1998) and south-eastern part of the continent (the Black Sea region and the Caucasus Mts.; Markova & Puzachenko, 2019). Contribution of a given refugium into postglacial recolonization processes varied a lot among species such as, for example, red deer Cervus elaphus (Niedziałkowska, Doan, et al., 2021), wild boar Sus scrofa (Niedziałkowska, Tarnowska, et al., 2021), or common vole Microtus arvalis (Stojak et al., 2015). Cold-adapted species such as, for example, reindeer Rangifer tarandus (Sommer et al., 2014), saiga antelope Saiga tatarica (Nadachowski et al., 2016), or arctic fox Alopex lagopus (Dalén et al., 2007), thrived in the glacial stages, expanded their ranges southward, and during the onset of interglacial they underwent massive, large-scale extinctions. Species of mammals that have very broad biogeographic niche (from the Mediterranean to the boreal zone) had more diverse response to geological time-scale pulsation of climate, most likely with several glacial refugia located at both lower and higher latitudes, where subpopulations diverged into different lineages and could have developed adaptations to different climate, habitat, and food-related conditions. Examples of such species include the bank vole Clethrionomys glareolus (Tarnowska et al., 2019), weasel Mustela nivalis (McDevitt et al., 2012), common vole M. arvalis, and field vole Microtus agrestis (Baca et al., 2020;Stojak et al., 2019).
Among European ungulates, the roe deer (Capreolus capreolus Linnaeus, 1758) ( Figure 1) is the most widely distributed species (Lovari et al., 2016). It occurs throughout the continent (Figure 2), except for the northernmost Scandinavia and some islands. In southern Europe, roe deer lives in the Mediterranean zone, and the eastern range of the species reaches the Caucasus Mountains, northern Iran, and Iraq (Danilkin, 1995). The eastern border of its range runs through western Russia. Roe deer is a habitat opportunist occupying both forests and open habitats (Apollonio et al., 2010;Geist, 1998).
European roe deer is distinctly smaller that the Siberian roe deer (Capreolus pygargus (Pallas, 1771) occurring in Asia and in the south of the European part of Russia. The present ranges of the two species overlap in the Volga-Don region (Kashinina et al., 2018).
In some areas, European roe deer is an important game species.
In historical times, when the populations were locally exterminated or their numbers were low, there were attempts to strengthen them with introduced animals (Apollonio et al., 2014;Baker & Hoelzel, 2013).
So far, phylogenetic studies on European roe deer covered a large part of their contemporary range and indicated three main mitochondrial DNA (mtDNA) clades-Central, Eastern, and Western (Baker & Hoelzel, 2014;Lorenzini et al., 2014;Randi et al., 2004;Tsaparis et al., 2019). The Central clade is the most common throughout Europe, the Eastern one is restricted mainly to the Balkans, and the Western clade occurs only in the Iberian Peninsula (Gentile et al., 2009;Lorenzini et al., 2014;Mucci et al., 2012;Randi et al., 2004).
In addition, on the basis of internal structuring within clades, some researchers proposed the separation of the roe deer occurring in Italy as a distinct subspecies C. c. italicus (Gentile et al., 2009;Mucci et al., 2012). In Spain, there were also attempts to identify the Celtic-Iberian mtDNA clade (Baker & Hoelzel, 2014;Royo et al., 2007) and a subspecies C. c. garganta (Meunier, 1983).
Recent studies showed the introgression of the Siberian roe deer mtDNA genes into the eastern populations of the European roe deer (Kashinina et al., 2018;Lorenzini et al., 2014;Markov et al., 2016;Matosiuk et al., 2014). This showed a new level of the genetic complexity in the species and opened a debate on the sources of that admixture (Kashinina et al., 2018;Lorenzini et al., 2014;Matosiuk et al., 2014;Świsłocka et al., 2019). However, the limited sampling (Lorenzini et al., 2014) did not allow to define whole spatial range at which the introgression has occurred.
Despite multiple studies on European roe deer phylogeography, there still remain substantial gaps in the sampling coverage in Europe, which make the picture incomplete and thus inconclusive.
There were no roe deer studies done in northern (Finland), eastern (Belarus, Estonia), central (Slovakia, Czech Republic), and southeastern (Croatia, Slovenia) parts of the continent. The most comprehensive phylogeographical analyses, done by Lorenzini et al. (2014) and Randi et al. (2004) focused mostly on identifying the origin of mtDNA clades and on disentangling the relationships among them. So far, there had been no attempts to determine the location of the contact zones between them. Moreover, the broad spatial spread of F I G U R E 1 Roe deer female with fawn. Photograph by Tibor Pataky F I G U R E 2 Roe deer (Capreolus capreolus) sampling locations in Europe: distribution of own and literature data in the species range the clades suggests that, apart from separation of the C. c. italicus subspecies, there is some internal heterogeneity within the three main clades.
In our study, we aimed at reconstructing the phylogeny of roe deer on the basis of intensive sampling in central and Eastern Europe, including many areas that have not been studied before.
We pooled our new data on mtDNA sequences with already available roe deer sequences in order to perform a more detailed and comprehensive analysis of roe deer phylogeography in the whole European range of the species (in 26 countries). The aims of our study, based on the analysis of a 610-bp fragment of mtDNA control region, were to: (1) describe the genetic diversity of roe deer over the entire species range; (2) identify spatial patterns of the clade distribution; and (3) compare the demographic processes at the level of lineages, clades, and haplogroups. Finally, we discuss the possible LGM refugia and routes of postglacial recolonization of the continent by roe deer.

| Sample and data collection
During 2008-2017, we collected 1469 roe deer samples from 20 countries in central and Eastern Europe (Table 1, Figure 2). The study area ranged from Germany to Russia (6°35′-43°23′E) and from Finland to Greece (67°42′-38°44′N). Samples, which consisted of fresh fragments of skin or muscle from legally hunted animals, were in majority taken by hunters. Prior to DNA extraction, all samples were stored in 96% ethanol at -20°C. We assigned the geographic coordinates of the samples on the basis of information on location provided by the hunters. Additionally, we included 1541 sequences of roe deer from western and southern Europe (see Table S1), based on data available in GenBank and their frequencies reported in publications (Baker & Hoelzel, 2013;Biosa et al., 2015;Gentile et al., 2009;Lorenzini et al., 2014;Randi et al., 2004;Royo et al., 2007).

| DNA extraction and sequencing
We extracted total genomic DNA using the Qiagen DNeasy Blood and Tissue Kit following the manufacturer's guidelines. A fragment of the mtDNA control region was amplified by PCR with the primers L-Pro and H-Phe (Randi et al., 1998). Cycling conditions were 95°C for 15 min; 35 cycles of 94°C for 15 s, 56°C for 15 s, and 72°C for 1 min; and a final step of 72°C for 10 min. PCR products were purified using Clean Up (A&A Biotechnology). Sequencing reactions were carried out in a 10 µl volume using the Big Dye sequencing kit v.3.1 (Applied Biosystems) with the forward primer. Products were purified with the Exterminator kit (A&A Biotechnology) and sequenced on an ABI 3130 xl Genetic Analyzer (Applied Biosystems). Sequencing results were analyzed with the ABI DNA Sequencing Analysis software and aligned in BioEdit v.7.0.5.3 (Hall, 1999). Additional roe deer mtDNA data from GenBank were pooled with the obtained own fragments and shortened to keep the same length for all sequences.
We assigned the obtained sequences to haplotypes using Arlequin 3.5.1.3 software (Excoffier & Lischer, 2010). Indels were considered as differences in haplotype definition. Each haplotype was assigned a code reflecting one of the two species: Cc-for the European roe deer (C. capreolus) or Cp-for the Siberian roe deer (C. pygargus). Internal structure of haplogroups was based on haplotype genealogy constructed in HapView (Salzburger et al., 2011).
Italian haplogroup C. c. italicus (C7 in this study) has already been defined as such in the literature (Gentile et al., 2009;Mucci et al., 2012).
To determine sequence evolution model, we used jModelTest2 (Darriba et al., 2012), using resources available on CIPRES Science Gateway (Miller et al., 2010). The best model chosen by jModelTest2 was the Hasegawa, Kishino, and Yano model (HKY; Hasegawa et al., 1985) with additional variation rate among sites and a proportion of invariable sites (+G, +I). Phylogenetic trees were constructed in Mega 7.0.14 (Kumar et al., 2016) and MrBayes 3 (Ronquist & Huelsenbeck, 2003) with the maximum likelihood method and 10,000 bootstrap replications, using the chosen model. Summary statistics were calculated in DnaSP 5.10.01 (Librado & Rozas, 2009) according to three classification levels: lineage, clade, and grouping based on HapView. The following statistics were computed: number of unique haplotypes (h), number of segregating (polymorphic) sites (S), haplotype diversity (H d ), nucleotide diversity (π), and average number of pairwise nucleotide differences (k). Additionally we included B index (Levins, 2020) to express the diversity of haplotypes, using the formula: where p i is the proportion of samples with haplotype i in a deme. The B index minimum value is 1 and its upper bound is equal to the number of haplotypes in the sample.
To evaluate possible models of expansion, we performed two neutrality tests in DnaSP: Tajima's D (Tajima, 1989) and Fu's Fs Estimates of genetic diversity of mtDNA control region (610 bp) in the roe deer (C. capreolus) lineages, clades, and haplogroups (see Figure 3). Number of samples is indicated by n (numbers in parentheses correspond to samples collected and analyzed in this study), h-number of haplotypes, S-number of polymorphic sites, H d -haplotype diversity, π-nucleotide diversity, k-average number of pairwise differences. Haplotypes CcH53 and CcH55 (European lineage) were not assigned to any clade   (Fu, 1997). We also used mismatch distribution with the sudden expansion model and goodness-of-fit tests (sum of squared deviation; Harpending's raggedness index (R). Expansion time (T) was estimated based on equation: T = τ /2µ, where τ is Tau estimated in Arlequin 3.5.1.3 and µ is the mutation rate described as units of substitutions per locus per generation (Rogers & Harpending, 1992). We applied a mutation rate of 0.04-0.08 substitutions per site per Myr (Lorenzini et al., 2014;Randi et al., 1998Randi et al., , 2004Royo et al., 2007;Vernesi et al., 2002) and used 3 years as a generation time (Randi et al., 1998(Randi et al., , 2004. Time since expansion was calculated using the Excel Spreadsheet provided by Schenekar and Weiss (2011).
The Last Glacial Period had impact on large areas of the northern hemisphere and its names differ regionally. Throughout this manuscript we used the name "Weichselian" as reference, which is equivalent to the other terms such as "Vistulian"
The numbers of samples representing each haplotype ranged from 1 to 177; 106 sequences were singletons (see Table S1). We iden-  Table 2). The Western clade, in turn, was least numerous, but it had the same level of nucleotide diversity as the Central one (π = 0.008). The Western clade had the highest average number of pairwise differences among all clades (Table 2), which suggests its nonmonophyletic origin and reflects the distribution of the clade and haplogroups on phylogenetic networks (Figure 3 and Figure S3). The Eastern clade had intermediate values in all estimates.
Further analysis of the haplotype genealogy of European roe deer lineage indicated the presence of an internal structuring in the described clades (Figure 3; Figure S3). Only haplotypes CcH55 (2 samples from Czech Republic) and CcH53 (1 sample from Slovakia and 1 from Serbia) were not assigned to any specific group. Inside the Eastern clade, we detected four haplogroups E1-E4 ( Figure 3). The most numerous was haplogroup E4 (n = 187), while E1 was the least numerous (n = 77). Out of all the haplotypes that formed the haplogroup E1, only one (CcH220) was previously described (see Table S1).

F I G U R E 3
Internal structure of European roe deer (C. capreolus) clades defined based on mtDNA haplotype genealogy constructed in HapView program. Circles represent unique haplotypes, while their sizes correspond to the number of individuals with a given haplotype. Small undescribed circles denote missing haplotypes. Names of the clades (see Figures S1 and S2) were assigned according to the naming proposed by Randi et al. (2004). Details on haplotypes in Table S1 The Central clade consisted of eight haplogroups (C1-C8), among which C1 had the highest number of individuals (n = 2030) and haplotypes (h = 206). The smallest recorded haplogroup was C5 (Table 2, Figure 3). Haplogroups C3 and C7 consisted of haplotypes found only in the published data. Those two haplogroups had the lowest average number of pairwise differences (0.127 and 1.341, respectively) and low nucleotide diversity (<0.001 and 0.002). All haplotypes grouped in haplogroup C7 were previously described in roe deer belonging to the subspecies C. c. italicus (comp. Randi et al., 2004).
The Western clade consisted of two haplogroups (W1 and W2) represented by similar numbers of individuals ( Table 2). The haplogroup W1 had the highest average number of pairwise differences among all detected clades and haplogroups (k = 4.846) and the highest value of nucleotide diversity among all haplogroups (π = 0.008).
The Western clade had many missing haplotypes (Figure 3 and Figure S3).

| Spatial distribution of lineages, clades, and haplogroups
The spatial distribution of roe deer lineages in Europe showed a strong geographical pattern. Haplotypes of the Siberian lineage were recorded in western Russia, Finland, Estonia, Belarus, Lithuania, Poland, Ukraine, Slovakia, Romania, and Hungary ( Figure   S4). Their range fully overlapped with that of the European lineage. The highest proportion of the Siberian haplotypes was found in south-eastern and eastern parts of roe deer geographical range (populations 14 and 13; 57 and 53% of samples, respectively), and it declined westwards (Figure 4a).
In the European lineage, we observed the spatial structuring of the three major clades ( Figure S4)  (Figure 4b and Figure S5). The ranges of haplogroups C6 and C8 were restricted to east-central Europe.
The Eastern clade was recorded in southern and east-central Europe between 8° and 39°E longitude and below 56°N latitude ( Figure S4). In Greece and Bulgaria, 99% of roe deer belonged to it, and the share of that clade declined northward (Figure 4a). The haplogroup E1 occurred only in Belarus, Lithuania, and the adjoining borderlands of Poland, Russia, and Ukraine (Figure 4b and Figure S6). Haplogroups E2-E4 considerably overlapped across the study area, yet some differences in their geographical ranges could still be observed (Figure 4b and Figure S6). For instance, individuals assigned to E3 were mainly located in the Balkan-Dinaric and the Carpathian regions.
The Western clade was least numerous, but it was distributed over large areas of continental Europe from the Iberian Peninsula to western Russia ( Figure S4 and Figure  and Figure S4). Groups of related haplotypes in each of those haplogroups were endemic to some regions of Europe (Figure 4b and  Figure 2). Numbers in italics are B indices indicating effective haplotype diversity central part of roe deer range (populations 6, 9, 10, and 12; B from 3.7 to 5.9). All peripheral populations (except for population 1 in the Iberian Peninsula) showed low diversity (B from 1.3 to 2.8; Figure 4b).

| Demographic history of roe deer
The goodness-of-fit tests, which compared expansion model with and E1 (see Table S2). The only haplogroups not showing expansion tendency were C8 and W2, while C4 had only a weak support for expansion (see Table S2). The Central and Eastern clades exhibited strong unimodal distributions of the pairwise differences, which indicated one main expansion event in each case ( Figure 5).
The Western clade ( Figure 5) and the Siberian lineage (see Figure   S8) had multimodal distributions.
Expansion times were calculated for the spatial and demographic models based on two assumed mutation rates (see Methods).
Therefore, the results-with very wide confidence intervals-can only give a broad view of the demographic history of European roe deer. Most of haplogroups of the Central and the Western clades showed expansion times predating the LGM (see Table S3). Possibly, only C2, C3, and C5 expanded during or after the LGM. The oldest expansion time, perhaps reaching as far as the Eemian interglacial (<100 ka BP), was found inhaplogroupW1. Three haplogroups in the Eastern clade (E1, E2, E3) showed the expansion time around or after the LGM (see Table S3).

| DISCUSS ION
We presented the comprehensive picture of the genetic diversity of roe deer-the most numerous, widespread, and ecologically flexible ungulate species of Europe. We identified the phylogeny within the species, mapped the spatial distribution of lineages, clades, and haplogroups, and attempted to describe their demographic history.
By pooling together a large set of new data from central and eastern Europe and available literature data, we showed a clear, continentwide pattern of phylogeography of roe deer populations spanning from the Iberian Peninsula to western and south-western Russia.
However, there are still two important regions where data are lacking. The first is nearly the whole France (except for its southern part), and the second are the easternmost south-eastern verges of roe deer range, including the Caucasus Mts., Turkey, and southeastern regions of the European part of Russia, where the range of C. capreolus overlaps with that of C. pygargus. Future research, filling those gaps in data coverage and analysis, cannot only broaden the pattern we presented in this work, but also verify some of our conclusions and interpretations.

| The Siberian lineage of roe deer mtDNA
The introgression of mtDNA genes of the Siberian roe deer into

| The European lineage of roe deer mtDNAchanges in time and space
The European roe deer was already present in Europe at least 600 ka BP and the large number of fossil and subfossil records evidenced that the species occurred in both glacial and interglacial phases since then (Sommer et al., 2009). Radiocarbon (C 14 ) dating of roe deer fossil bones (a method that reaches only as far as 50-60 ka BP) showed that between 60 and 21 ka BP, roe deer occurred not only in the Mediterranean peninsulas, but also it repeatedly reached central Europe during milder interstadials (Sommer et al., 2009). Rapid climatic changes from colder to warmer periods were conducive to recurrent expansion-retreat cycles of roe deer and thus their ge- where its single, southernmost records were found in Bulgaria.
Among eight haplogroups, C4 had the widest range (see Figure S5) and the oldest expansion time, however, with a wide confidence interval. Thus, we suggest that it might be a relic older than the Weichselian Glaciation. C4 might have then survived the LGM in southern France, northern Italy, and the north-eastern shores of the Black Sea (Figure 6a), the areas indicated as suitable for the species during the LGM (Markova & Puzachenko, 2019). Its expansion could have started from those regions after the glacial period.
Haplogroups C1 and C2 now are common in western and central Europe; we propose that the refugial areas of C1 were in southern France, the Dinaric Mts., and the eastern Carpathian region (Moldova), while those of C2-in the Iberian peninsula and southern France.
In addition to the earlier described Italian haplogroup, denoted as C7 in this paper, that occurred in the Apennine Peninsula only (Lorenzini et al., 2002;Mucci et al., 2012;Randi et al., 2004), we found three other haplogroups, each with a very restricted range: C5 recorded over the northern side of the Black Sea, C6 found mainly in Interestingly, the pan-European study on molecular biogeography of wild boar S. scrofa (Niedziałkowska, Tarnowska, et al., 2021) discovered a rare, ancient clade of the species in the Russian part of the range of roe deer haplogroup E1. Also, the Europe-wide study of wolf Canis lupus population by SNP analysis (Stronen et al., 2013) found that the western and northern Belarus wolves showed the most divergent genotypes within north-central Europe.
The Western clade, discovered for the first time in the western part of the continent, appeared "Western" no longer, as we found it also in central and eastern Europe, where it formed up to 5.3% of roe deer in the sampling region no 13 (Belarus, Lithuania, and western Russia). We decided to retain the name "Western" for this group, as it has already been well established in literature. The internal phylogeny of the Western clade (many missing, possibly extinct haplotypes), discontinuity of its occurrence over the continent, spatial segregation of groups of haplotypes, and very old expansion times strongly suggest that the Western clade has been a relic of the Eemian interglacial (MIS 5) 130-80 ka BP, when it probably had occurred throughout the continent. This could also explain why results from different reconstructions (see Figure 3 and Figure (2015), Markova and Puzachenko (2019), with modification over the Carpathians according to deciduous and coniferous tree range in the LGM after Tzedakis et al. (2013). Symbols of haplogroups as in Table 2 and Figure 3. RD-roe deer refugial population of unknown genetic profile. Cpthe Siberian lineage expansion into the European roe deer. Solid lines in (b) and (c) show the present ranges of occurrence of the haplogroups (see Figures S5 and S7) the north-eastern refuge located in the present-day Belarussian-Russian borderland (Figure 6a).

| Postglacial colonization of Europe by roe deer
Our data suggest that different haplogroups of roe deer contributed differently to the postglacial colonization of the European continent. Among 14 haplogroups belonging to three mtDNA clades, seven played the major roles in recolonization of the postglacial habitats. These were, from the most successful: C2, C1, C4, E4, E2, E3, and C8 (Figure 6a,b). The other seven haplogroups showed rather weak expansion or most probably remained in their LGM refugia.