Mitochondrial genome copy number in human blood-derived DNA is strongly associated with insulin levels and related metabolic traits and primarily reflects cell-type composition differences

Mitochondrial copy number is known to vary among humans and across tissues, and a population-based study of mitochondrial genome copy number (MT-CN) in blood - as estimated from genome sequencing data - observed strong (~54%) heritability. However, the genetic causes and phenotypic consequences of MT-CN variation in humans are not well-studied. Here, we studied MT-CN variation in blood-derived DNA from 19,184 Finnish individuals with deep cardiometabolic trait measurements using a combination of whole genome (N = 4,163) and exome sequencing (N = 19,034) data as well as imputed array genotypes (N = 17,718). We confirmed that MT-CN in blood is highly heritable (31% by GREML). We identified two loci in the nuclear genome that are significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6x10-8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMB1M1 gene (P = 3.0x10-8), which has been reported to protect against non-alcoholic fatty liver disease in model organisms. We also found that MT-CN is strongly associated with insulin levels (P = 2.0x10-21), fat mass (P = 4.5x10-16), and other related traits. Using a Mendelian randomization framework, we constructed a genetic instrument for MT-CN using penalized regression with adjustment for potentially confounding covariates and found a significant association with insulin levels, which suggests that our MT-CN measurement in blood may be causally related to metabolic syndrome. Finally, we computed our genetic instrument in UK Biobank participants and tested it against a set of cell count and cardiometabolic traits. We found significant associations between MT-CN and both neutrophil and platelet counts (P = 1.8x10-8 and P = 1.2x10-3). While the association between MT-CN and metabolic syndrome traits was replicated in the UK Biobank, adjusting for cell counts largely eliminated these signals, suggesting that MT-CN is actually a proxy measurement for neutrophil and platelet counts in its effect on metabolic syndrome. Taken together, these results suggest that measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits.

Traits associated with mitochondrial (MT) content include coronary heart disease (CHD), type 2 diabetes, and metabolic syndrome traits such as insulin sensitivity/resistance, obesity, and blood triglycerides. However, these studies have generally been limited by small sample sizes and low Results are shown for the EMMAX test and a permutation test in which mitochondrial haplogroups were adjusted for.
To understand the connection between MT-CN and more clinically relevant phenotypes, we tested our MT-CN estimate against Matsuda ISI and disposition index ( Table 1 ), which measure insulin sensitivity and secretion, respectively, and were not included in the initial screen. MT-CN was strongly associated with both insulin phenotypes. Notably, the Matsuda ISI signals survived adjustment for fat mass percentage after excluding diabetic individuals, which indicates that the association of peripheral blood MT-CN with insulin sensitivity was independent of fat mass.
To test for this association signal in a larger cohort, we developed a method to estimate mitochondrial genome copy number using 19,034 samples with whole exome sequencing (WES) data  identical directions of effect ( S2 Table ).
Anecdotally, it is interesting to note that these MT association signals can also be detected using read-depth analysis of the nuclear genome ( S1 Fig ; manuscript in prep.), where reads derived from mtDNA align erroneously to several nuclear loci based on homology between the MT genome and ancient nuclear mitochondrial insertions. This result provides additional evidence for the reported trait associations using an independent MT-CN estimation method, and indicates that these homology-based signals need to be taken into account in future CNV association studies.
7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020.

Heritability analysis
To assess the extent to which MT-CN is genetically determined, we estimated the heritability of mitochondrial genome copy number using GREML ( Table 2 ). We explored two different approaches available: (1) analysis of the 4,149 samples with WGS data that passed quality control measures, where both nuclear genotypes and MT-CN are measured directly from the WGS data, and (2) analysis of the set of 17,718 samples with imputed genotype array data, where MT-CN is estimated from WES data. Of these, (1) benefited from more accurate measurement of genotype and phenotype, whereas (2) had noisier measurements but benefited from larger sample size. We focused primarily on the METSIM cohort, both because of the homogeneity of this cohort (see Methods) and because the number of FINRISK samples with WGS data was small. 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10.23.20218586 doi: medRxiv preprint In the WGS analysis, the GREML-estimated heritability of MT-CN in METSIM was 31%somewhat less than the 54% value reported in the only prior large-scale study of peripheral blood MT-CN heritability, which was based on low-coverage WGS [11] . For comparison, we used this same approach to estimate heritability of LDL in METSIM WGS data, which yielded an estimate of 34% with a standard error of 7.9% ( Table 3 ). This is broadly consistent with prior work [27,28] , including analysis of the same Finnish sample set using distinct methods [26] (20.2% heritability). These results show that mitochondrial genome copy number is a genetically determined trait with significant heritability, comparable to that of LDL and other quantitative cardiometabolic traits [26] . The analysis of imputed METSIM genotypes using WES-estimated MT-CN yielded an estimated heritability of 11%, which is much lower than the WGS-based estimate ( Table 2 ). To understand this discrepancy, we repeated the GREML analysis with the other two combinations of phenotype source (WGS vs. WES estimation) and genotype source (WGS vs. imputed array). When using the WGS-measured phenotype, the estimated heritability decreased only slightly (31% to 27%)

9
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  when switching from the WGS to imputed genotypes. This suggests that the difference in genotyping method was not the main driver of the observed heritability disparity between the WGS and imputed array datasets. Conversely, when analyzing the imputed METSIM genotypes, switching from WGS-measured to WES-measured MT-CN resulted in a large drop (27% to 11%) in estimated heritability. This suggests that the extra noise inherent in WES-based MT-CN estimates was responsible for the reduction in the GREML-estimated heritability despite the increased sample size of the imputed array dataset.

Identification of genetic factors associated with MT-CN
Previous studies have identified three autosomal quantitative trait loci (QTL) for MT-CN in other populations [22,23] .  Table 4 ). Of the previously reported QTLs, we observed a suggestive signal at rs445 (P = 7.95×10 -4 ), but no signal at rs11006126 (P = 0.206) or within the linkage peak reported on

10
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. Two loci markers reached the genome-wide significance of 5×10 -8 , identified by lead markers rs2288464 and rs9389268. rs9389268 was the only marker that was strongly associated with MT-CN in the METSIM analyses of both WGS and imputed array data (P = 7.87×10 -7 and P = 1.62×10 -8 , respectively).
Although this variant was not significantly associated with MT-CN in FINRISK or in a random-effect 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. meta-analysis of both cohorts (P = 0.115), the lack of signal in FINRISK is likely the product of lower-quality MT-CN measurements in FINRISK, which displayed heterogeneity across survey years ( S4 Fig ). This variant is located in an intergenic region between the MYB and HBS1L genes, is common across many populations, and is slightly more frequent in Finns compared to non-Finnish Europeans (gnomAD v3 MAF 34.4% vs. 26.0%). MYB and HBS1L are hematopoietic regulators [29,30] , and the region between them is known to be associated with many hematological parameters including fetal hemoglobin levels, hematocrit, and erythrocyte, platelet, and monocyte counts [31][32][33][34] .
It has been suggested that these intergenic variants function by disrupting MYB transcription factor binding and disrupting enhancer-promoter looping [35] . Conditioning the METSIM-only imputed array GWAS on rs9399137 -a tag SNP shown to be associated with many of these hematological parameters [33] -resulted in elimination of the rs9389268 signal entirely (P = 0.408), suggesting that the haplotype responsible for the association of rs9389268 with MT-CN in our data is the same one previously known to be associated with numerous hematological phenotypes.
This result is not surprising considering our approach for normalizing MT-CN. Because our MT-CN estimate was based on the ratio of mtDNA coverage to nuclear DNA coverage, changes in the cell type composition of blood could result in changes in our normalized measurement if the underlying cell types have different average numbers of mitochondria. This is especially true of platelets, which can contain mitochondria but not nuclei, and whose counts are known to be associated with rs9399137. rs2288464 seemed to be a good candidate due to its location in the 3' untranslated region of MRPL34 , which codes for a large subunit protein of the mitochondrial ribosome. While the association signal at this marker was not observed in the WGS data (P = 0.0655), based on the observed effect size of this variant in WES and imputed data as well as the number of WGS datasets available, there 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was insufficient power (~0.5% at α = 5×10 -7 ) to robustly detect this association in the WGS data alone [36] .
We next performed rare variant association (RVAS) analyses using a mixed-model version of SKAT-O [37] to test for genes in which the presence of high-impact rare variants might be associated with MT-CN levels (see Methods; Fig 3 , Table 5 ). Using WES data, the only gene passing the Bonferroni-adjusted P value threshold of 2.16×10 -6 was TMBIM1 (P = 2.96×10 -8 ), a member of a gene family thought to regulate cell death pathways [38] . TMBIM1 has been shown to be protective against non-alcoholic fatty liver disease (NAFLD), progression to non-alcoholic steatohepatitis, and insulin resistance in mice and macaques [39] . Interestingly, in our analysis, in which a burden test was determined to be optimal by SKAT-O, putative loss-of-function variants in TMBIM1 were associated with a higher MT-CN ( Fig 3c ), which was in turn associated with less severe metabolic syndrome, suggesting that TMBIM1 is actually a risk gene, not a protective one. Thus, the published function of TMBIM1 makes it a strong candidate, although the direction of effect in our data disagreed with the direction suggested by prior work in model organisms [39] .

13
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Inference of causality in the association between MT-CN and insulin
To further understand the association between MT-CN and fasting serum insulin, we employed a Mendelian randomization (MR) approach with MT-CN as the exposure and insulin as the outcome.
Using penalized regression, we leveraged our extensive phenotype data to build a genetic instrument 14 .
CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We calculated our instrument using either L1 or L2 regularization. In both cases, the MT-CN instrument was not a significant predictor (α = 0.05) of insulin when we constructed our instrument from WGS variants, but was significant when the instrument was constructed from imputed array variants. This was likely due to the larger sample size of the imputed array data set. However, the effect estimates were remarkably similar across all four cases. As a result, inverse-variance weighted meta-analysis across datasets yielded highly significant P values for both penalties. In summary, our analysis provided evidence for a significant causal role for MT-CN in determining fasting serum insulin levels that was robust to the choice of regression penalty when building the genetic instrument. We note that this evidence for causality comes with some caveats (see Methods).

Replication and biological interpretation
In principle, changes in MT-CN can be caused by changes in the number of mitochondrial genome copies within cells or by changes in the blood cell type composition. Based on the association with rs9389268 and the nuances of the normalization procedure described above, we sought to test the hypothesis that our MT-CN measurement primarily reflects the cell type composition of the blood rather than the number of mitochondria per cell. We used imputed array genotype and phenotype data from the UK Biobank (N = 357,656) for this purpose [40] .
We first tested cell counts from the UK Biobank (UKBB) against a polygenic risk score (PRS)

15
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. G represents genotypes, Z is a genetic instrument value constructed from G , X represents ln(MT-CN), Y represents ln(Insulin), and U represents any confounders of the association between X and Y . The arrow from X to Y is dashed to indicate that although an association is known, the relationship is not known to be causal. In this formulation, a significant association between Z and Y would provide evidence that X is casual for Y . (b) Strategy for choosing variables to adjust for when building Z in order to enforce MR assumptions. A represents those columns of covariate matrix W that are associated with Y (represented by the solid line between A and Y ) and B represents those columns of matrix W that are associated with X conditional on A (represented by the solid line between B and X ). Dashed lines represent possible, but unproven associations. The penalized regression of X on G used to build Z is adjusted for A and B (with no penalty) in an attempt to prevent any associations between Z and either A or B (represented by the blue X's). While an association between B and Y is unlikely (represented by the dashed line between B and Y ) because B is not contained in A , B is still adjusted for in the penalized regression to be as conservative as possible. (c) Strategy for choosing covariates to adjust for in the causality test of Y against Z in another attempt to reduce the impact of any remaining associations between Z and assumption-violating variables. Covariate sets I , II , III , and IV are defined by the presence of known first-order 16 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. first-order association is not known, but a higher-order association is possible. Covariate sets II , III , and IV (colored blue) are adjusted for in the causality test because there is at least one first order association linking them to Z and Y , so they risk violating MR assumptions 2 or 3. (d) Results of Mendelian randomization test for causality of MT-CN on fasting serum insulin.
for MT-CN built using the genetic instrument from the Finnish data. Leukocyte, neutrophil, and platelet counts were all significantly associated with MT-CN PRS conditional on age, age 2 , and sex (see Methods, Table 6 ). However, adjusting for neutrophil counts in the leukocyte regression eliminated the signal (PRS regression coefficient P = 0.839), suggesting that the leukocyte count signal was driven by the effect of neutrophil count. We removed any high leverage, large residual samples and repeated the neutrophil and platelet count regressions to ensure that this result was robust to outliers and found no appreciable change in significance ( Table 6 ). As a result, we concluded that our MT-CN measurement was significantly associated with neutrophil and platelet counts. Subsequent analyses were performed both with and without adjustment for these variables, as described below.

All samples
No post hoc high leverage outliers We note that the effect directions of the associations of platelet counts with metS and MT-CN PRS seem inconsistent at first glance, as platelet counts were positively correlated with MT-CN PRS ( Table 6 ) and metS ( S3 Table ) while MT-CN and insulin (a proxy for metS) were negatively correlated ( Fig 1b ). However, the FinMetSeq regression model in Fig 1b was not conditional on any other covariates (although age, age 2 , and sex were regressed out of the MT-CN measurement prior to this analysis), while the UKBB models that gave rise to Table 6 and S3 Table adjusted for many additional covariates, including 20 PCs and age-sex interaction terms. As a result, the effect directions for the analyses in the two datasets are not directly comparable.
We next tested for associations between MT-CN PRS and several cardiometabolic phenotypes from the test in Fig 1a (see Methods). With the exception of C-reactive protein, which showed no significant association, all tested phenotypes showed nominal association with MT-CN PRS at α = 0.05, with total triglycerides and HDL being the only traits surviving Bonferroni correction ( Table 7 ).
We interpret this as replication of the link between mitochondrial genome copy number and metabolic syndrome in a large, independent data set. 18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020.  ( Table 7 ). This suggests that the associations we observed between MT-CN and metabolic traits arose simply because MT-CN is a proxy for platelet and neutrophil count. This was supported by the fact that direct testing of platelets and neutrophils against triglycerides, fat mass, and HDL yielded remarkably significant associations, which survived post-hoc removal of high-leverage, high-residual outlier samples ( S3 Table ). This evidence for MT-CN as a proxy for platelet and neutrophil counts strongly suggests that the causal relationship observed in the Mendelian randomization experiment (see above) in fact represents a causative role for neutrophils and platelet counts in setting serum insulin levels.
Given the strong observed associations between blood cell count phenotypes and MT-CN PRS, we used these blood phenotypes to seek replication of the genetic associations detected in Finnish data. Using imputed UKBB genotype data, we tested the expected alternate allele dosage of both rs2288464 and rs9389268 against the same blood cell traits mentioned above, using linear regression ( S4 Table ; expected alternate allele dosage was calculated from genotype call probabilities as ). As expected given its known associations with multiple S (0 1) P (1 1) hematological parameters (see above), rs9389268 showed strong associations with all tested blood cell phenotypes. rs2288464 was not significantly associated with any of the five phenotypes after

19
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.23.20218586 doi: medRxiv preprint correction for multiple testing, although a nominal association was detected with total leukocyte count.
This further strengthens our belief that rs9389268 is truly associated with MT-CN through blood cell composition. We also tested TMBIM1 against the same blood cell traits using SKAT-O [37] , and found no significant associations ( S4 Table ). This may mean that TMBIM1 affects MT-CN through a mechanism other than altering blood cell type composition.
We further sought to generate hypotheses for other phenotypic associations with MT-CN. To this end, we performed a phenome-wide screen of MT-CN PRS against all of the UKBB phenotypes available to us. To curate and transform these phenotypes, we used a modified version of PHESANT [41,42] , which outputs all continuous variables in both raw and inverse rank-normalized form. We chose to interpret the results from the normalized continuous variables ( S5 Table ) to be conservative and robust to outliers, although the results of the raw continuous variable analyses were similar ( S6 Table ). No metabolic syndrome traits appeared among the tested traits with q < 0.05. However, the tests for HDL cholesterol, self-reported heart attack, and doctor-diagnosed heart attack did yield somewhat suggestive results ( q = 0.123, 0.176, and 0.176, respectively). We also repeated this screen with adjustment for neutrophil and platelet counts ( S7 Table and S8 Table ), resulting again in no metabolic syndrome phenotypes achieving < 0.05. The addition of neutrophil and platelet counts q as covariates attenuated the suggestive signals for HDL cholesterol, self-reported heart attack, and doctor-diagnosed heart attack ( q = 0.284, 0.391, and 0.402, respectively).

Discussion
We have described one of the most well-powered studies to date of the genetic relationship between MT-CN measurements in blood and cardiometabolic phenotypes. Our study is one of very few of which we are aware to utilize sequencing data for this purpose, and our samples are more deeply phenotyped than those used in prior work. Our data show highly significant associations 20 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. between blood-derived MT-CN measurements and several cardiometabolic traits, particularly insulin and fat mass. We observed strong heritability of MT-CN (31%), on par with other widely studied cardiometabolic traits such as LDL, and identified one single marker association on a haplotype previously associated with several hematological parameters [31][32][33][34] . We also report one gene with a rare-variant association with MT-CN, TMBIM1 , that has a known link to non-alcoholic fatty liver disease in model organisms [39] . Interestingly, rare variants in this gene were not associated (in aggregate) with blood cell counts in UK Biobank, suggesting that this gene might modulate NAFLD risk via MT-CN, but through a mechanism not involving blood composition. More work must be done to confirm this tentative genetic association and mechanistic hypothesis.
Using a novel multiple-variant instrument-building method, we report evidence from Mendelian randomization supporting a causal role for MT-CN in metabolic syndrome. Further, we used UK Biobank data to show that not only does the link between MT-CN and metabolic syndrome replicate in an independent data set using a polygenic risk score approach, but also that this association is mediated by neutrophil and platelet counts, contrary to previous claims that variability in the number of mitochondria per cell is responsible for CHD risk [12] . Taken together, these results suggest that neutrophil and platelet counts in peripheral blood are causally related to fasting insulin and related traits.
There is prior evidence to support the role of inflammation -specifically via innate immune cells such as neutrophils -in the etiology of type 2 diabetes (T2D) and insulin resistance [43][44][45] , which suggests a plausible model by which peripheral blood neutrophil count could influence metabolic syndrome. Nutrient excess and high fat diets are known to recruit neutrophils into tissues, which then cause insulin resistance both by releasing TNF-⍺ and IL-6 and by upregulating cyclooxygenase [43] . This leads to increased LTB4 and subsequent upregulation of NF-B, a central regulator of inflammation. Moreover, free fatty acids also cause neutrophils to stay in tissues longer,

21
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. resulting in persistent inflammation and leading to insulin resistance [43] . While it is known that inflammation, and particularly neutrophils, play a role in metabolic syndrome, our results strongly suggest that peripheral blood neutrophil count causally contributes to this process and is associated with heritable genetic variation in the human population. Overall, our work provides further insight into the role that inflammation plays in metabolic syndrome and supports the idea that targeting inflammation may be a fruitful avenue of investigation in developing future therapeutics.

Genotype and phenotype data
Whole genome sequencing (WGS) was performed on a cohort of 4,163 samples comprising 3,074 male samples from the METSIM study [46] and 1,089 male and female samples from the FINRISK study [47] . Genomic DNA was fragmented on the Covaris LE220 instrument targeting 375 libraries was loaded over the remaining number of HiSeq X lanes to achieve the desired coverage for this project. 2x150 paired end sequence data were demultiplexed using a single index, which was a restriction on the HiSeqX instrument at this time. A minimum of 19.5x coverage was achieved per sample.
The quality of the aligned sequence data was assessed using metrics generated by Picard [48] v2. 4 The metrics for judgement of passing data quality were: FIRST_OF_PAIR_MISMATCH_RATE < .05, SECOND_OF_PAIR_MISMATCH_RATE < 0.05, haploid coverage ≥ 19.5, interchromosomal rate < .05, and discordant rate < 5. All of the above metrics must have been met in order for the sample to be assigned as QC pass. If a sample did not meet the passing criteria, the following failure 23 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. analysis was performed: a) If the Freemix score was at least 0.05, the sample or the library was considered contaminated, and both the library and the sample were abandoned; b) if the discordant rate was over 5 and/or the inter-chromosomal rate was over 0.05, the quality of DNA was considered poor and the sample was removed from the sequencing pipeline; and c) in the case of a) and b), the collaborator was contacted to determine if selection of a replacement sample from the same cohort was desired or feasible.
Separately, whole exome sequencing (WES) data (N = 19,034), genotyping array data (N = 17,718) imputed using the Haplotype Reference Consortium panel [49] v1.1, and transformed, normalized quantitative cardiometabolic trait data were obtained from an earlier study [26] . FINRISK array data came in nine genotyping batches, two of which were excluded from the present study due to small sample size. The traits, normalization and transformation procedures, and sample sizes are described in a previous publication [26] . The WES and imputed sample sets contained 4,013 and

24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. When applying the variant recalibration the following options were used:

25
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. Sample-level quality control was also undertaken on this dataset; 13 samples were identified for exclusion because of singleton counts that were at least eight median absolute deviations above the median. Separately, 12 sex-discordant samples were flagged using plink --check-sex , and after examining chromosome Y missingness and F coefficient values for these samples, only the one that clearly differed from its reported sex was marked for exclusion. No samples were excluded based on missingness fraction or the first five principal components. In total, 14 samples were excluded from the heritability, GWAS, and Mendelian randomization analyses; the other analyses were performed without exclusion of these samples. As a result, the former analyses were performed with N = 4,149 while the latter had N = 4,163.

Mitochondrial genome copy number estimation
We estimated mitochondrial genome copy number (MT-CN) from both WGS and WES data. In WGS data, we used BEDTools [53] to calculate per-base coverage on the mitochondrial genome from the latest available 4,163 WGS CRAM files. MT-CN was then calculated by normalizing the mean 26 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. We used a similar procedure to estimate MT-CN from WES data, with mean autosomal coverage estimates taken from XHMM [54] . However, as mitochondrial genomic coverage was nonuniform due to the use of hybrid capture probes, mean mtDNA coverage was not an obvious We assigned mitochondrial haplogroups using HaploGrep [55] v1.0. Mitochondrial SNP/indel variants were genotyped using GATK GenotypeGVCFs, and a customized filter based on allele balance was applied to the combined callset. HaploGrep was then used to call mitochondrial haplotypes for each individual. We adjusted for major haplogroups in the same linear regressions of metabolic traits onto MT-CN (see Results) and calculated the summary statistics from a permutation test as implemented in lmPerm ( https://github.com/mtorchiano/lmPerm ).

Heritability analysis
To estimate heritability of MT-CN, a genomic relatedness-based restricted maximum-likelihood (GREML) method was used as implemented in GCTA [56] . The original GREML [57] method was used first, followed by GREML-LDMS [58] to account for biases arising from differences in minor allele frequency (MAF) spectrum or linkage disequilibrium (LD) properties between the genotyped variants and the true causal variants [59] . For both analyses, MT-CN values were normalized and residualized for sex, age, and age 2 as described above. Heritability estimation was performed jointly and separately for METSIM and FINRISK samples using WGS and imputed array genotypes. In all cases, a minimum MAF threshold of 1% was applied. Beyond those covariates already adjusted for in the normalization process, sensitivity analyses were performed on imputed array data to determine whether heritability estimates were sensitive to inclusion of covariates. In these experiments, either cohort or FINRISK genotyping array batch were included as fixed-effect covariates in joint analyses of imputed array data; in neither case was the final heritability estimate significantly affected (h 2 = 0.09, SE = 0.02 in both cases). In GREML-LDMS, genotypes were split into four SNP-based LD score quartiles and two MAF bins (1% > MAF > 5% and MAF > 5%), and genetic relatedness matrices (GRMs) were estimated separately for each of the eight combinations. The GREML algorithm was then run on all eight GRMs simultaneously using the first ten principal components (PCs) of the 28 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. genotype matrix (as calculated by smartPCA v13050; https://github.com/DReichLab/EIG ) as fixed covariates [58] . The use of GREML-LDMS over GREML also did not affect estimated heritability values ( Table 3 ), suggesting that the properties of the causal variants for this trait do not lead to significant biases when using the standard GREML approach.
We observed that WGS heritability estimates decrease when analyzing FINRISK and METSIM data together compared to analysis of METSIM alone ( Table 2 ) (note that FINRISK-only heritability estimates are not reliable as they have large standard errors resulting from the small number of FINRISK samples sequenced). One potential explanation for this is that there exists substantial heterogeneity across FINRISK survey years ( S4 Fig ), and between the FINRISK and METSIM cohorts, with respect to the reliability with which mtDNA was captured (likely due to different DNA preparation protocols).

Genome-wide association analyses
Genome-wide association studies (GWAS) were performed using the same normalized phenotype used in heritability analyses. Single-variant GWAS were conducted using EMMAX as implemented in EPACTS ( https://genome.sph.umich.edu/wiki/EPACTS ). Kinship matrices required by EMMAX were generated by EPACTS; kinship matrices for WGS GWAS were generated from WGS data, while those for WES and imputed array based GWAS were generated from WES data. A P value threshold of 5×10 -8 was used for the WGS and imputed array GWAS while 5×10 -7 was used for significance in the WES GWAS. Single-variant association analyses of WGS and WES data did not include any covariates in the EMMAX model, although all association analyses were performed using MT-CN values that adjusted for age, age 2 , sex, and cohort (see "Mitochondrial Genome Copy Number Estimation").

29
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. Gene-based variant aggregation studies (RVAS) were done using a mixed-model version of SKAT-O [37] as implemented in EPACTS. Variants with CADD [60] score greater than 20 and minor allele frequency less than 1% were grouped into genes as annotated by VEP [61] (which annotates a variant with a gene name if the gene falls within 5k of that gene by default). For gene-based RVAS, genome-wide significance thresholds varied slightly due to differing the number of genes with at least two variants meeting the above criteria in each test, but were approximately 2×10 -6 in all cases.

Mendelian randomization
To assess the evidence for a causal relationship between mitochondrial genome copy number and fasting serum insulin levels, the METSIM cohort alone was used due to its homogeneity of sex, collection procedures, and location. A penalized regression based, multiple variant Mendelian randomization (MR) approach was employed to enforce the necessary assumptions of MR methods.
While some MR studies have tested one or more assumptions post hoc , to our knowledge, there is no published method that tries to enforce these assumptions during the process of building the genetic instrument in the absence of a large set of known genotype-exposure associations. In our formulation ( Fig 4a ), X , the natural log of MT-CN (adjusted for nuclear genomic coverage but not for age, age 2 , or sex), and a genotype matrix G were used to build a genetic instrument Z , which was then tested against Y , the natural log of fasting serum insulin. The goal of the MR approach was to use a large number of common variants to build a genetic instrument Z that satisfied the three assumptions of MR [62] : 1. Association of Z with X 2. Independence of Z from any variables U confounding the relationship between X and Y 3. Independence of Z and Y given X and U

30
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. To attempt to build a genetic instrument satisfying assumptions 2 and 3, the deep METSIM phenotype data were leveraged. A matrix W was constructed using the 75 measured traits and first 20 PCs of the genotype matrix (including a third-degree polynomial basis for PC 1). From these variables, covariates that could violate one of these two assumptions were chosen by selecting columns of W associated with X or Y ( Fig 4b ). These columns were selected using two successive LASSO feature selection procedures. First, a set A of covariates associated with Y was chosen by using LASSO to regress Y onto W . In this regression, age and the third-degree polynomial basis for PC 1 were left unpenalized to ensure that A contains these covariates. The shrinkage parameter was chosen by tenfold cross-validation as the largest value that gives a mean squared error (MSE) within one standard error of the minimum observed MSE. Next, the columns of W associated with X conditional on A were chosen using a similar LASSO procedure in the regression of X onto W . In this step, however, the variables in set A were left unpenalized in order to only capture associations that are conditionally independent of A . The selected variables from this regression were designated set B .
The instrument was built using a penalized regression (using either an L1 or L2 penalty, as implemented in glmnet [63] ) of the form , where W A and W B are the columns of W ∼ G W X representing sets A and B , respectively, and G is a genotype matrix containing the alternate allele dosage (missing alleles are replaced with the MAF, similarly to PLINK [64] ) of all variants with MAF greater than 1% and marginal GWAS P value below 0.01. As X was the target vector for this regression, assumption 1 of MR was trivial. In the penalized regression, W A and W B were unpenalized in an effort to orthogonalize the regression coefficients of the genotypes to these covariates in an effort to enforce assumptions 2 and 3. glmnet was run with a convergence threshold of 1×10 -10 and maximum number of iterations of 200,000. To avoid the overfitting that would result from calculating instrument values on the same samples on which regression coefficients are learned [65] , the 31 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. penalized regression model was fit on independent subsets of the data as follows. Five models were fit, each by holding out a different 20% of samples, such that the instrument value computed for each sample was calculated using the regression coefficient vector learned without that sample. The vector of possible shrinkage parameters λ for all five models was supplied as (10 3 ,10 2 ,…,10 -13 ,10 -14 ), and the λ value which minimized the joint residual sum of squares of all five models was chosen for instrument calculation.
Formally, we randomly partitioned the set of samples S with nonmissing insulin measurements into five nonoverlapping sets S j for . We denote set complements as , such 1, , } j = { … 5 ∖ S S j C = S j that each S j C contained 80% of the training samples. The instrument vector Z j for each S j was computed as follows: , where Z j is the instrument vector for S j , G j is the genotype matrix of S j , and β G (-j) is the vector of genotype regression coefficients from the model described above, Often, the inclusion of unpenalized covariate sets A and B in the instrument-building regression was not sufficient to completely orthogonalize Z to these covariates (see below). As a result, the test for association between Z and Y was performed conditional on a set of potentially assumption-violating covariates chosen using the newly constructed instrument Z in another attempt to account for possible violations of MR assumptions in the causality test ( Fig 4c ). To choose this set of covariates C , a final feature selection step was performed using LASSO regression of Z on W with covariate set A excluded from the penalty. As in the previous feature selection steps, the shrinkage 32 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. parameter was chosen via tenfold cross-validation as the largest value with MSE within one standard error of the minimum observed MSE. Once this set, D , of covariates associated with Z was chosen, the covariates in W were partitioned into sets I , II , III , and IV based on their membership in A and D (see Fig 4 ). Formally, this partitioning was done as follows: , , , and , where and . Then, the test for causality sets II , III , and IV (colored blue in Fig 4c ).
To account for missing data in W , missing values were multiply imputed using regression trees as implemented in the R package mice [68] v3.4.0 ( maxit=25 ). This imputation was repeated 1000 times in parallel, with each set of imputed values being carried through the entire procedure described above. The resulting 1000 computed instrument effect sizes and standard errors were combined using Rubin's method as implemented in the R package Amelia [69] v1.7.5. The combined effect size and standard error were then tested for significance using a t-test with 998 degrees of freedom.
The above procedure was performed separately for METSIM samples with WGS data (N = 3,034) and METSIM samples with only imputed array data (N = 6,774) using an L1 penalty in the instrument-building regression, and again using an L2 penalty. Both sample sets were limited to those for which relevant quantitative traits were available. An inverse-variance weighted meta-analysis was performed across data sets for L1 and L2-penalized regression separately. The resulting effect size and standard error were tested for significance using a Z test.
To ensure that our results were not driven by outlier samples, we removed outliers in two stages. Before the MR analyses, we used principal components analysis (PCA), Mahalanobis distance, and multi-trait extreme outlier identification to remove 5 WGS samples and 15 imputed array samples based on quantitative trait data. We also removed high leverage, high residual outliers 33 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10.23.20218586 doi: medRxiv preprint from the causality test regression (see below) post hoc and recomputed the instrument effect sizes to ensure that there was no significant change in the results. In each of the 1,000 multiple imputation runs, among the samples with standardized residual greater than 1, the top 10 samples by leverage were recorded. Any sample that was recorded in this way in at least one run was then excluded from the re-analysis as a post hoc outlier. The results of this additional analysis showed only very small differences in effect estimates, and their interpretation remained the same ( S9 Table ). Thus, we concluded that our causal inference results were not driven by outlier samples.
One caveat of this method is that, as mentioned above, exclusion of sets A and B from the regression penalty did not perfectly orthogonalize the resulting instrument from these variables in practice ( S7 Fig ). Reasons for this may include relatively low levels of shrinkage in the instrument-building regression or higher order associations between MT-CN and the confounding variables. However, our method still represents an improvement over the current standard, which is not to adjust for these covariates at all. Another caveat is that it is impossible to determine the perfect set of covariates for which adjustment is appropriate. We note that a known source of bias in MR studies is the selection of samples based on case-control status for a related disease [71] . While METSIM is a population-based study, samples were selected for WGS based on cardiovascular disease case-control status so as to enrich the sequenced samples for cases. This has the potential to bias a MR experiment if both the exposure and the outcome are associated with the disease, which is certainly possible. However, in our design, all of the METSIM samples not chosen for WGS were tested in the imputed array experiment. The consistency of effect estimates between the WGS and imputed array samples both in the L1 and L2 penalty cases ( Fig 4d ) suggests that there is little to no bias arising from sample selection in this experiment.

Calculation and testing of polygenic risk score in the UK Biobank
To search for associations between MT-CN and other phenotypes, the genetic instrument calculated in Finnish imputed array data was computed and treated as a polygenic risk score (PRS) in a relatively homogenous subset of 357,656 UK Biobank samples identified by a previous study [42] . We calculated , the average of the five values of β G (-j) across all 1000 multiple imputation runs β G using an L2 penalty and imputed array data -the L2 penalty was chosen because it performed better than the L1 on both METSIM data types, and the imputed array data set was chosen due to its larger sample size than the WGS set ( Fig 4d ). Next, to keep the procedure as consistent as possible with the imputation protocol used for METSIM -which used haploid dosage values to call imputed genotypes [26] -we called imputed genotypes using the expected alternate allele dosage from the UK Biobank by setting thresholds of 0.5 and 1.5. Using the resulting imputed variant calls, we calculated our PRS as , where is the UK Biobank genotype dosage matrix constructed in the same way as G in METSIM.
To test for associations with MT-CN PRS in the UK Biobank, we employed two approaches: a hypothesis-driven analysis targeted to the phenotypes associated with MT-CN in the Finnish data as well as a hypothesis-free screen of all the phenotypes available to us.

35
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In the targeted analysis, we used our genetic instrument from the MR experiment as a PRS for MT-CN in our chosen subset of the UK Biobank and tested for associations with several blood cell count and metabolic syndrome traits. Given the association of MT-CN with rs9389268 (see Results), we selected as cell count traits total leukocyte count as well as lymphocyte, neutrophil, monocyte, and platelet counts for testing (because lymphocyte count was not readily available, it was calculated as the product of leukocyte count and lymphocyte percentage). We did not include basophils and eosinophils in this analysis considering that they comprise a small minority of white blood cells and are unlikely to affect MT-CN measured from whole blood. All cell count traits were log-transformed and standardized separately by sex.
We took several steps to eliminate outlier samples in the dataset. Through three iterations of PCA on the cell count matrix and subsequent outlier removal, we removed 1,637 outlier samples. We We also performed a hypothesis-free, phenome-wide screen of UK Biobank traits to which we had access ( S10 Table ), to search for other associations with MT-CN PRS. The statistical models used in this screen were of the same form as those described above, both with and without adjustment for neutrophil and platelet counts. To curate and transform phenotypes, we used an adapted version of PHESANT [41,42] . A few further modifications were made to the pipeline, the most significant being the direct use of logistic regression for testing categorical unordered variables, the inclusion of cancer phenotypes, and the exclusion of sex-specific (or nearly sex-specific) categorical traits. The PHESANT pipeline we used [42] outputs continuous variables both in their raw form and after applying an inverse rank normal transformation. For the sake of being conservative and robust to outliers, we chose to interpret the results from the normalized continuous variables. To control false discovery rate, we performed a Benjamini-Hochberg procedure with Storey correction as implemented in the R package qvalue [72] v2.18.0 on the categorical and normalized continuous variables together. As a secondary analysis, this same correction was applied to the categorical and raw continuous variables together.

37
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.23.20218586 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; Supporting information S1 Fig. Insulin association signal at chromosome 1 NumtS. Manhattan plot from an analysis of copy number variation at nuclear mitochondrial insertion sites (NumtS) using CNVnator read-depth measurements in 1 kb genomic windows, using an initial set of 2,049 samples from an earlier data freeze analyzed for an unrelated CNV association study (manuscript in prep.). Shown at the bottom are two tracks from the UCSC Genome Browser [73] . The black track represents an assembly gap upstream of the association signal, and the blue track shows the location of the NumtS region where the association peak is located.  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.23.20218586 doi: medRxiv preprint a genome-wide significance level of 5×10 -8 ; no tested markers achieved this level of significance. The  The instrument in (a) and (b) was computed using an L1 penalty, while that in (c) and (d) was computed using an L2 penalty.

S8 Fig. Example of collider bias.
If Z and Y are independently causal for a variable U , then adjusting for U can induce an association between Z and Y that did not previously exist.  Table. MT-CN associations with insulin and fat mass in WES data. Results of EMMAX test of association between normalized MT-CN and both fat mass and fasting serum insulin using WES data.
Association tests were performed in all samples and also separately among samples with and without WGS data. Table. Direct associations between platelet/neutrophil counts and metabolic syndrome phenotypes in the UK Biobank. Results of direct testing of platelet and neutrophil counts against 45 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table. MT-CN phenome-wide screen using raw continuous variables with no cell count adjustment. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and raw continuous variables.

S3
These tests were performed without adjustment for platelet and neutrophil counts. A dark line separates the results at q = 0.05. Table. MT-CN phenome-wide screen using normalized continuous variables with adjustment for platelet and neutrophil counts. Results of phenome-wide screen for association 46 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table. MT-CN phenome-wide screen using raw continuous variables with adjustment for platelet and neutrophil counts. Results of phenome-wide screen for association with MT-CN PRS in the UK Biobank (N = 357,656), with FDR correction applied to all categorical variables and raw continuous variables. These tests were performed with adjustment for platelet and neutrophil counts.

S7
A dark line separates the results at q = 0.05.

S9 Table. Mendelian randomization results after removal of high leverage post hoc outliers.
Results of Mendelian randomization test for causality of MT-CN on fasting serum insulin, after removal of high leverage post hoc outliers (see "Inference of causality in the association between MT-CN and insulin"). These results are nearly identical to those computed before outlier removal ( Fig   4d ), indicating that the detected signals were not driven by outlier samples. S10 Table. Phenotypes in the UK Biobank included in the phenome-wide screen after filtering and transformation by PHESANT. Variable type is reported by PHESANT, where "Categorical unordered" corresponds to a binary variable representing a single possible answer. Samples are generally only excluded as a result of sex imbalances, although there were four phenotypes for which a model was unable to be fit when adjusting for cell counts.

47
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.23.20218586 doi: medRxiv preprint