Michael D. Kessler is a statistical geneticist within the Analytical Genetics and Data Sciences (AGDS) group at the Regeneron Genetics Center (RGC) of Regeneron Pharmaceuticals. In this role, he applies statistical genetic techniques and modeling for the identification and interpretation of biological targets. His work is highly interdiscplinary, and he greatly enjoys working with colleagues across statistical, translational, experimental, legal, and executive teams in order to further the development of novel therapeutics.
PhD in Molecular Medicine, 2019
University of Maryland, Baltimore
BA in Biological Sciences, 2011
University of Southern California
Elevated red cell distribution width (RDW) has been associated with a range of health outcomes. This study aims to examine prognostic and etiological roles of RDW levels, both phenotypic and genetic predisposition, in predicting cardiovascular outcomes, diabetes, chronic kidney disease (CKD) and mortality. We studied 27,141 middle-aged adults from the Malmö Diet and Cancer study (MDCS) with a mean follow up of 21 years. RDW was measured with a hematology analyzer on whole blood samples. Polygenic scores for RDW (PGS-RDW) were constructed for each participant using genetic data in MDCS and published summary statistics from genome-wide association study of RDW (n = 408,112). Cox proportional hazards regression was used to assess associations between RDW, PGS-RDW and cardiovascular outcomes, diabetes, CKD and mortality, respectively. PGS-RDW was significantly associated with RDW (Pearson’s correlation coefficient = 0.133, p < 0.001). RDW was significantly associated with incidence of stroke (hazard ratio (HR) per 1 standard deviation = 1.06, 95% confidence interval (CI): 1.02-1.10, p = 0.003), atrial fibrillation (HR = 1.09, 95% CI: 1.06-1.12, p < 0.001), heart failure (HR = 1.13, 95% CI: 1.08-1.19, p < 0.001), venous thromboembolism (HR = 1.21, 95% CI: 1.15-1.28, p < 0.001), diabetes (HR = 0.87, 95% CI: 0.84-0.90, p < 0.001), CKD (HR = 1.08, 95% CI: 1.03-1.13, p = 0.004) and all-cause mortality (HR = 1.18, 95% CI: 1.16-1.20, p < 0.001). However, PGS-RDW was significantly associated with incidence of diabetes (HR = 0.96, 95% CI: 0.94-0.99, p = 0.01), but not with any other tested outcomes. RDW is associated with mortality and incidence of cardiovascular diseases, but a significant association between genetically determined RDW and incident cardiovascular diseases were not observed. However, both RDW and PGS-RDW were inversely associated with incidence of diabetes, suggesting a putative causal relationship. The relationship with incidence of diabetes needs to be further studied.
Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the β2 subunit of the α4β2 nicotine acetylcholine receptor. Rare predicted loss-of-function and likely deleterious missense variants in CHRNB2 in aggregate were associated with a 35% decreased odds for smoking heavily (odds ratio (OR) = 0.65, confidence interval (CI) = 0.56–0.76, P = 1.9 × 10−8). An independent common variant association in the protective direction (rs2072659; OR = 0.96; CI = 0.94–0.98; P = 5.3 × 10−6) was also evident, suggesting an allelic series. Our findings in humans align with decades-old experimental observations in mice that β2 loss abolishes nicotine-mediated neuronal responses and attenuates nicotine self-administration. Our genetic discovery will inspire future drug designs targeting CHRNB2 in the brain for the treatment of nicotine addiction.
Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1,2,3,4,5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.
Related Scientific Presentation – YouTube
Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling, as opposed to having domain experts developing rules for prediction tasks manually. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as “deep learning”. Deep learning encompasses neural networks with many layers and the algorithms that make them perform well. These neural networks comprise artificial neurons arranged into layers and are modeled after the human brain, even though the building blocks and learning algorithms may differ. Each layer receives input from previous layers (the first of which represents the input data), and then transmits a transformed version of its own weighted output that serves as input into subsequent layers of the network. Thus, the process of “training” a neural network is the tuning of the layers’ weights to minimize a cost or loss function that serves as a surrogate of the prediction error. The loss function is differentiable so that the weights can be automatically updated to attempt to reduce the loss. Deep learning uses artificial neural networks with many layers (hence the term “deep”). Given the computational advances made in the last decade, it can now be applied to massive data sets and in innumerable contexts. In many circumstances, deep learning can learn more complex relationships and make more accurate predictions than other methods. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. For example, deep learning has been used to predict protein–drug binding kinetics, to identify the lab-of-origin of synthetic DNA, and to uncover the facial phenotypes of genetic disorders. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript’s writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning (Fig 1). For readers who are new to machine learning, we recommend reviewing general machine learning principles before getting started with deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others. The major similarities between deep learning and traditional computational methods also became apparent. Although deep learning is a distinct subfield of machine learning, it is still a subfield. It is subject to the many limitations inherent to machine learning, and most best practices for machine learning also apply to deep learning. As with all computational methods, deep learning should be applied in a systematic manner that is reproducible and rigorously tested. Ultimately, the tips we collate range from high-level guidance to best practices for implementation. It is our hope that they will provide actionable, deep learning–specific instructions for both new and experienced deep learning practitioners. By making deep learning more accessible for use in biological research, we aim to improve the overall usage and reporting quality of deep learning in the literature and to enable increasing numbers of researchers to utilize these state-of-the art techniques effectively and accurately.
Characterization of ancestry-linked peptide variants in disease-relevant patient tissues represents a foundational step to connect patient ancestry with disease pathogenesis. Nonsynonymous single-nucleotide polymorphisms encoding missense substitutions within tryptic peptides exhibiting high allele frequencies in European, African, and East Asian populations, termed peptide ancestry informative markers (pAIMs), were prioritized from 1000 genomes. In silico analysis identified that as few as 20 pAIMs can determine ancestry proportions similarly to >260K SNPs (R2 = 0.99). Multiplexed proteomic analysis of >100 human endometrial cancer cell lines and uterine leiomyoma tissues combined resulted in the quantitation of 62 pAIMs that correlate with patient race and genotype-confirmed ancestry. Candidates include a D451E substitution in GC vitamin D-binding protein previously associated with altered vitamin D levels in African and European populations. pAIMs will support generalized proteoancestry assessment as well as efforts investigating the impact of ancestry on the human proteome and how this relates to the pathogenesis of uterine neoplasms.