Michael D. Kessler

Michael D. Kessler

Biomedical Researcher and Geneticist

Regeneron Genetics Center

Institute for Genome Science

Biography

Michael D. Kessler is a statistical geneticist within the Analytical Genetics and Data Sciences (AGDS) group at the Regeneron Genetics Center (RGC) of Regeneron Pharmaceuticals. In this role, he applies statistical genetic techniques and modeling for the identification and interpretation of biological targets. His work is highly interdiscplinary, and he greatly enjoys working with colleagues across statistical, translational, experimental, legal, and executive teams in order to further the development of novel therapeutics.

Interests
  • Genetics and Genomics
  • Oncology
  • Bioinformatics
  • Biological Data Science
  • Translational Epidemiology
Education
  • PhD in Molecular Medicine, 2019

    University of Maryland, Baltimore

  • BA in Biological Sciences, 2011

    University of Southern California

Experience

 
 
 
 
 
Sr Manager, Statistical Genetics
Feb 2021 – Present New York
  • Promoted three times across my first three years, going from an individually contributing Sr Statistical Geneticist to a Sr Manager in a hybrid role with multiple direct reports and continued individual contribution.
  • Led germline cancer genetic association analyses across >200 EHR derived hematologic and oncologic phenotypes, resulting in >3 patent filings
  • Discovered >50 novel germline genetic loci associated with hematologic phenotypes (e.g. clonal hematopoiesis), including signals with therapeutic target potential
  • Presented scientific results externally (American Society of Human Genetics, American Society of Hematology, Institute for Translational Medicine and Therapeutics), and internally to senior R&D leadership (including the company co-founder and CSO)
  • Performed large-scale genetic analysis using cloud-based computational infrastructure (e.g. AWS, DNANexus)
  • Developed >5 DNANexus applets as components of genetic data processing pipelines
  • Applied numerous statistical and translational genetic techniques (e.g. GWAS, fine-mapping, ML) for the analysis of genetic and phenotypic data
  • Collaborated across pharmacogenomic, therapeutic, clinical, and translational teams as part of multiple interdisciplinary human genetics projects
  • Regularly participated in scientific recruitment and collaboration building, and served on dozens of hiring committees across multiple research and development teams
 
 
 
 
 
Postdoctoral Research Fellow
Dec 2019 – Feb 2021 Maryland
  • Identified associations between latent patterns of immune checkpoint inhibitor response in melanoma and progression free survival
  • Worked on the integration of single cell transcriptomic and epigenomic sequencing data
  • Developed and implemented a novel machine learning algorithm that uses a top-scoring pairs approach to perform regression
  • Characterized the super enhancer landscape and associated transcription factor enrichements in head and neck squamous cell carcinoma
  • Identified gene candidates underpinning the association between vitamin D and head and neck cancer
  • Worked on a method to estimate per sample immunogenicity scores that derive from cancer-specific alternative splicing events
 
 
 
 
 
Graduate Research Assistant
May 2015 – Nov 2019 Maryland
  • Characterized and analyzed genetic variant distributions from > 40,000 human genomes as part of the NHLBI TOPMed program
  • Analyzed whole genome sequencing data from >1000 Samoan individuals to study the evolutionary history of modern Samoa
  • Calculated and analyzed de novo mutation rates across ancestrally diverse human populations and discovered a mutation reduction in the Amish founder population
  • Determined the genetic ancestry for 1018 common cancer cell line models and identified gene expression and mutation differences from ancestrally diverse cancer cell lines
  • Researched and wrote a review of cutting-edge approaches for cancer detection and treatment via non-invasive liquid biopsy
  • Demonstrated that significant cost and time biases exist when performing clinical genetic variant prioritization on individuals with non-European ancestral backgrounds

Publications

Browse all or visit Google Scholar.

Red cell distribution width and its polygenic score in relation to mortality and cardiometabolic outcomes
Red cell distribution width and its polygenic score in relation to mortality and cardiometabolic outcomes

Elevated red cell distribution width (RDW) has been associated with a range of health outcomes. This study aims to examine prognostic and etiological roles of RDW levels, both phenotypic and genetic predisposition, in predicting cardiovascular outcomes, diabetes, chronic kidney disease (CKD) and mortality. We studied 27,141 middle-aged adults from the Malmö Diet and Cancer study (MDCS) with a mean follow up of 21 years. RDW was measured with a hematology analyzer on whole blood samples. Polygenic scores for RDW (PGS-RDW) were constructed for each participant using genetic data in MDCS and published summary statistics from genome-wide association study of RDW (n = 408,112). Cox proportional hazards regression was used to assess associations between RDW, PGS-RDW and cardiovascular outcomes, diabetes, CKD and mortality, respectively. PGS-RDW was significantly associated with RDW (Pearson’s correlation coefficient = 0.133, p < 0.001). RDW was significantly associated with incidence of stroke (hazard ratio (HR) per 1 standard deviation = 1.06, 95% confidence interval (CI): 1.02-1.10, p = 0.003), atrial fibrillation (HR = 1.09, 95% CI: 1.06-1.12, p < 0.001), heart failure (HR = 1.13, 95% CI: 1.08-1.19, p < 0.001), venous thromboembolism (HR = 1.21, 95% CI: 1.15-1.28, p < 0.001), diabetes (HR = 0.87, 95% CI: 0.84-0.90, p < 0.001), CKD (HR = 1.08, 95% CI: 1.03-1.13, p = 0.004) and all-cause mortality (HR = 1.18, 95% CI: 1.16-1.20, p < 0.001). However, PGS-RDW was significantly associated with incidence of diabetes (HR = 0.96, 95% CI: 0.94-0.99, p = 0.01), but not with any other tested outcomes. RDW is associated with mortality and incidence of cardiovascular diseases, but a significant association between genetically determined RDW and incident cardiovascular diseases were not observed. However, both RDW and PGS-RDW were inversely associated with incidence of diabetes, suggesting a putative causal relationship. The relationship with incidence of diabetes needs to be further studied.

Rare coding variants in CHRNB2 reduce the likelihood of smoking
Rare coding variants in CHRNB2 reduce the likelihood of smoking

Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the β2 subunit of the α4β2 nicotine acetylcholine receptor. Rare predicted loss-of-function and likely deleterious missense variants in CHRNB2 in aggregate were associated with a 35% decreased odds for smoking heavily (odds ratio (OR) = 0.65, confidence interval (CI) = 0.56–0.76, P = 1.9 × 10−8). An independent common variant association in the protective direction (rs2072659; OR = 0.96; CI = 0.94–0.98; P = 5.3 × 10−6) was also evident, suggesting an allelic series. Our findings in humans align with decades-old experimental observations in mice that β2 loss abolishes nicotine-mediated neuronal responses and attenuates nicotine self-administration. Our genetic discovery will inspire future drug designs targeting CHRNB2 in the brain for the treatment of nicotine addiction.

Common and rare variant associations with clonal haematopoiesis phenotypes
Common and rare variant associations with clonal haematopoiesis phenotypes

Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1,2,3,4,5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.

Related Scientific Presentation – YouTube

Ten quick tips for deep learning in biology
Ten quick tips for deep learning in biology

Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling, as opposed to having domain experts developing rules for prediction tasks manually. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as “deep learning”. Deep learning encompasses neural networks with many layers and the algorithms that make them perform well. These neural networks comprise artificial neurons arranged into layers and are modeled after the human brain, even though the building blocks and learning algorithms may differ. Each layer receives input from previous layers (the first of which represents the input data), and then transmits a transformed version of its own weighted output that serves as input into subsequent layers of the network. Thus, the process of “training” a neural network is the tuning of the layers’ weights to minimize a cost or loss function that serves as a surrogate of the prediction error. The loss function is differentiable so that the weights can be automatically updated to attempt to reduce the loss. Deep learning uses artificial neural networks with many layers (hence the term “deep”). Given the computational advances made in the last decade, it can now be applied to massive data sets and in innumerable contexts. In many circumstances, deep learning can learn more complex relationships and make more accurate predictions than other methods. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. For example, deep learning has been used to predict protein–drug binding kinetics, to identify the lab-of-origin of synthetic DNA, and to uncover the facial phenotypes of genetic disorders. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript’s writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning (Fig 1). For readers who are new to machine learning, we recommend reviewing general machine learning principles before getting started with deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others. The major similarities between deep learning and traditional computational methods also became apparent. Although deep learning is a distinct subfield of machine learning, it is still a subfield. It is subject to the many limitations inherent to machine learning, and most best practices for machine learning also apply to deep learning. As with all computational methods, deep learning should be applied in a systematic manner that is reproducible and rigorously tested. Ultimately, the tips we collate range from high-level guidance to best practices for implementation. It is our hope that they will provide actionable, deep learning–specific instructions for both new and experienced deep learning practitioners. By making deep learning more accessible for use in biological research, we aim to improve the overall usage and reporting quality of deep learning in the literature and to enable increasing numbers of researchers to utilize these state-of-the art techniques effectively and accurately.

Peptide ancestry informative markers in uterine neoplasms from women of European, African, and Asian ancestry
Peptide ancestry informative markers in uterine neoplasms from women of European, African, and Asian ancestry

Characterization of ancestry-linked peptide variants in disease-relevant patient tissues represents a foundational step to connect patient ancestry with disease pathogenesis. Nonsynonymous single-nucleotide polymorphisms encoding missense substitutions within tryptic peptides exhibiting high allele frequencies in European, African, and East Asian populations, termed peptide ancestry informative markers (pAIMs), were prioritized from 1000 genomes. In silico analysis identified that as few as 20 pAIMs can determine ancestry proportions similarly to >260K SNPs (R2 = 0.99). Multiplexed proteomic analysis of >100 human endometrial cancer cell lines and uterine leiomyoma tissues combined resulted in the quantitation of 62 pAIMs that correlate with patient race and genotype-confirmed ancestry. Candidates include a D451E substitution in GC vitamin D-binding protein previously associated with altered vitamin D levels in African and European populations. pAIMs will support generalized proteoancestry assessment as well as efforts investigating the impact of ancestry on the human proteome and how this relates to the pathogenesis of uterine neoplasms.