Publications

Red cell distribution width and its polygenic score in relation to mortality and cardiometabolic outcomes
Red cell distribution width and its polygenic score in relation to mortality and cardiometabolic outcomes

Elevated red cell distribution width (RDW) has been associated with a range of health outcomes. This study aims to examine prognostic and etiological roles of RDW levels, both phenotypic and genetic predisposition, in predicting cardiovascular outcomes, diabetes, chronic kidney disease (CKD) and mortality. We studied 27,141 middle-aged adults from the Malmö Diet and Cancer study (MDCS) with a mean follow up of 21 years. RDW was measured with a hematology analyzer on whole blood samples. Polygenic scores for RDW (PGS-RDW) were constructed for each participant using genetic data in MDCS and published summary statistics from genome-wide association study of RDW (n = 408,112). Cox proportional hazards regression was used to assess associations between RDW, PGS-RDW and cardiovascular outcomes, diabetes, CKD and mortality, respectively. PGS-RDW was significantly associated with RDW (Pearson’s correlation coefficient = 0.133, p < 0.001). RDW was significantly associated with incidence of stroke (hazard ratio (HR) per 1 standard deviation = 1.06, 95% confidence interval (CI): 1.02-1.10, p = 0.003), atrial fibrillation (HR = 1.09, 95% CI: 1.06-1.12, p < 0.001), heart failure (HR = 1.13, 95% CI: 1.08-1.19, p < 0.001), venous thromboembolism (HR = 1.21, 95% CI: 1.15-1.28, p < 0.001), diabetes (HR = 0.87, 95% CI: 0.84-0.90, p < 0.001), CKD (HR = 1.08, 95% CI: 1.03-1.13, p = 0.004) and all-cause mortality (HR = 1.18, 95% CI: 1.16-1.20, p < 0.001). However, PGS-RDW was significantly associated with incidence of diabetes (HR = 0.96, 95% CI: 0.94-0.99, p = 0.01), but not with any other tested outcomes. RDW is associated with mortality and incidence of cardiovascular diseases, but a significant association between genetically determined RDW and incident cardiovascular diseases were not observed. However, both RDW and PGS-RDW were inversely associated with incidence of diabetes, suggesting a putative causal relationship. The relationship with incidence of diabetes needs to be further studied.

Rare coding variants in CHRNB2 reduce the likelihood of smoking
Rare coding variants in CHRNB2 reduce the likelihood of smoking

Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the β2 subunit of the α4β2 nicotine acetylcholine receptor. Rare predicted loss-of-function and likely deleterious missense variants in CHRNB2 in aggregate were associated with a 35% decreased odds for smoking heavily (odds ratio (OR) = 0.65, confidence interval (CI) = 0.56–0.76, P = 1.9 × 10−8). An independent common variant association in the protective direction (rs2072659; OR = 0.96; CI = 0.94–0.98; P = 5.3 × 10−6) was also evident, suggesting an allelic series. Our findings in humans align with decades-old experimental observations in mice that β2 loss abolishes nicotine-mediated neuronal responses and attenuates nicotine self-administration. Our genetic discovery will inspire future drug designs targeting CHRNB2 in the brain for the treatment of nicotine addiction.

Common and rare variant associations with clonal haematopoiesis phenotypes
Common and rare variant associations with clonal haematopoiesis phenotypes

Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1,2,3,4,5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.

Related Scientific Presentation – YouTube

Ten quick tips for deep learning in biology
Ten quick tips for deep learning in biology

Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling, as opposed to having domain experts developing rules for prediction tasks manually. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as “deep learning”. Deep learning encompasses neural networks with many layers and the algorithms that make them perform well. These neural networks comprise artificial neurons arranged into layers and are modeled after the human brain, even though the building blocks and learning algorithms may differ. Each layer receives input from previous layers (the first of which represents the input data), and then transmits a transformed version of its own weighted output that serves as input into subsequent layers of the network. Thus, the process of “training” a neural network is the tuning of the layers’ weights to minimize a cost or loss function that serves as a surrogate of the prediction error. The loss function is differentiable so that the weights can be automatically updated to attempt to reduce the loss. Deep learning uses artificial neural networks with many layers (hence the term “deep”). Given the computational advances made in the last decade, it can now be applied to massive data sets and in innumerable contexts. In many circumstances, deep learning can learn more complex relationships and make more accurate predictions than other methods. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. For example, deep learning has been used to predict protein–drug binding kinetics, to identify the lab-of-origin of synthetic DNA, and to uncover the facial phenotypes of genetic disorders. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript’s writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning (Fig 1). For readers who are new to machine learning, we recommend reviewing general machine learning principles before getting started with deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others. The major similarities between deep learning and traditional computational methods also became apparent. Although deep learning is a distinct subfield of machine learning, it is still a subfield. It is subject to the many limitations inherent to machine learning, and most best practices for machine learning also apply to deep learning. As with all computational methods, deep learning should be applied in a systematic manner that is reproducible and rigorously tested. Ultimately, the tips we collate range from high-level guidance to best practices for implementation. It is our hope that they will provide actionable, deep learning–specific instructions for both new and experienced deep learning practitioners. By making deep learning more accessible for use in biological research, we aim to improve the overall usage and reporting quality of deep learning in the literature and to enable increasing numbers of researchers to utilize these state-of-the art techniques effectively and accurately.

Peptide ancestry informative markers in uterine neoplasms from women of European, African, and Asian ancestry
Peptide ancestry informative markers in uterine neoplasms from women of European, African, and Asian ancestry

Characterization of ancestry-linked peptide variants in disease-relevant patient tissues represents a foundational step to connect patient ancestry with disease pathogenesis. Nonsynonymous single-nucleotide polymorphisms encoding missense substitutions within tryptic peptides exhibiting high allele frequencies in European, African, and East Asian populations, termed peptide ancestry informative markers (pAIMs), were prioritized from 1000 genomes. In silico analysis identified that as few as 20 pAIMs can determine ancestry proportions similarly to >260K SNPs (R2 = 0.99). Multiplexed proteomic analysis of >100 human endometrial cancer cell lines and uterine leiomyoma tissues combined resulted in the quantitation of 62 pAIMs that correlate with patient race and genotype-confirmed ancestry. Candidates include a D451E substitution in GC vitamin D-binding protein previously associated with altered vitamin D levels in African and European populations. pAIMs will support generalized proteoancestry assessment as well as efforts investigating the impact of ancestry on the human proteome and how this relates to the pathogenesis of uterine neoplasms.

Exome sequencing and analysis of 454,787 UK Biobank participants
Exome sequencing and analysis of 454,787 UK Biobank participants

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10−11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene–trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.

Transfer learning between preclinical models and human tumors identifies a conserved NK cell activation signature in anti-CTLA-4 responsive tumors
Transfer learning between preclinical models and human tumors identifies a conserved NK cell activation signature in anti-CTLA-4 responsive tumors

Tumor response to therapy is affected by both the cell types and the cell states present in the tumor microenvironment. This is true for many cancer treatments, including immune checkpoint inhibitors (ICIs). While it is well-established that ICIs promote T cell activation, their broader impact on other intratumoral immune cells is unclear; this information is needed to identify new mechanisms of action and improve ICI efficacy. Many preclinical studies have begun using single-cell analysis to delineate therapeutic responses in individual immune cell types within tumors. One major limitation to this approach is that therapeutic mechanisms identified in preclinical models have failed to fully translate to human disease, restraining efforts to improve ICI efficacy in translational research. We previously developed a computational transfer learning approach called projectR to identify shared biology between independent high-throughput single-cell RNA-sequencing (scRNA-seq) datasets. In the present study, we test this algorithm’s ability to identify conserved and clinically relevant transcriptional changes in complex tumor scRNA-seq data and expand its application to the comparison of scRNA-seq datasets with additional data types such as bulk RNA-seq and mass cytometry. We found a conserved signature of NK cell activation in anti-CTLA-4 responsive mouse and human tumors. In human metastatic melanoma, we found that the NK cell activation signature associates with longer overall survival and is predictive of anti-CTLA-4 (ipilimumab) response. Additional molecular approaches to confirm the computational findings demonstrated that human NK cells express CTLA-4 and bind anti-CTLA-4 antibodies independent of the antibody binding receptor (FcR) and that similar to T cells, CTLA-4 expression by NK cells is modified by cytokine-mediated and target cell-mediated NK cell activation. These data demonstrate a novel application of our transfer learning approach, which was able to identify cell state transitions conserved in preclinical models and human tumors. This approach can be adapted to explore many questions in cancer therapeutics, enhance translational research, and enable better understanding and treatment of disease.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Evolutionary history of modern Samoans
Evolutionary history of modern Samoans

Archaeological studies estimate the initial settlement of Samoa at 2,750 to 2,880 y ago and identify only limited settlement and human modification to the landscape until about 1,000 to 1,500 y ago. At this point, a complex history of migration is thought to have begun with the arrival of people sharing ancestry with Near Oceanic groups (i.e., Austronesian-speaking and Papuan-speaking groups), and was then followed by the arrival of non-Oceanic groups during European colonialism. However, the specifics of this peopling are not entirely clear from the archaeological and anthropological records, and is therefore a focus of continued debate. To shed additional light on the Samoan population history that this peopling reflects, we employ a population genetic approach to analyze 1,197 Samoan high-coverage whole genomes. We identify population splits between the major Samoan islands and detect asymmetrical gene flow to the capital city. We also find an extreme bottleneck until about 1,000 y ago, which is followed by distinct expansions across the islands and subsequent bottlenecks consistent with European colonization. These results provide for an increased understanding of Samoan population history and the dynamics that inform it, and also demonstrate how rapid demographic processes can shape modern genomes.

De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population
De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population

De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.

Ancestral characterization of 1018 cancer cell lines highlights disparities and reveals gene expression and mutational differences
Ancestral characterization of 1018 cancer cell lines highlights disparities and reveals gene expression and mutational differences

Although cell lines are an essential resource for studying cancer biology, many are of unknown ancestral origin, and their use may not be optimal for evaluating the biology of all patient populations. An admixture analysis was performed using genome‐wide chip data from the Catalogue of Somatic Mutations in Cancer (COSMIC) Cell Lines Project to calculate genetic ancestry estimates for 1018 cancer cell lines. After stratifying the analyses by tissue and histology types, linear models were used to evaluate the influence of ancestry on gene expression and somatic mutation frequency. For the 701 cell lines with unreported ancestry, 215 were of East Asian origin, 30 were of African or African American origin, and 453 were of European origin. Notable imbalances were observed in ancestral representation across tissue type, with the majority of analyzed tissue types having few cell lines of African American ancestral origin, and with Hispanic and South Asian ancestry being almost entirely absent across all cell lines. In evaluating gene expression across these cell lines, expression levels of the genes neurobeachin line 1 (NBEAL1), solute carrier family 6 member 19 (SLC6A19), HEAT repeat containing 6 (HEATR6), and epithelial cell transforming 2 like (ECT2L) were associated with ancestry. Significant differences were also observed in the proportions of somatic mutation types across cell lines with varying ancestral proportions. By estimating genetic ancestry for 1018 cancer cell lines, the authors have produced a resource that cancer researchers can use to ensure that their cell lines are ancestrally representative of the populations they intend to affect. Furthermore, the novel ancestry‐specific signal identified underscores the importance of ancestral awareness when studying cancer.

Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire
Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire

Native Americans from the Amazon, Andes, and coastal geographic regions of South America have a rich cultural heritage but are genetically understudied, therefore leading to gaps in our knowledge of their genomic architecture and demographic history. In this study, we sequence 150 genomes to high coverage combined with an additional 130 genotype array samples from Native American and mestizo populations in Peru. The majority of our samples possess greater than 90% Native American ancestry, which makes this the most extensive Native American sequencing project to date. Demographic modeling reveals that the peopling of Peru began ∼12,000 y ago, consistent with the hypothesis of the rapid peopling of the Americas and Peruvian archeological data. We find that the Native American populations possess distinct ancestral divisions, whereas the mestizo groups were admixtures of multiple Native American communities that occurred before and during the Inca Empire and Spanish rule. In addition, the mestizo communities also show Spanish introgression largely following Peruvian Independence, nearly 300 y after Spain conquered Peru. Further, we estimate migration events between Peruvian populations from all three geographic regions with the majority of between-region migration moving from the high Andes to the low-altitude Amazon and coast. As such, we present a detailed model of the evolutionary dynamics which impacted the genomes of modern-day Peruvians and a Native American ancestry dataset that will serve as a beneficial resource to addressing the underrepresentation of Native American ancestry in sequencing studies.

The evolution of polymorphic hybrid incompatibilities in house mice
The evolution of polymorphic hybrid incompatibilities in house mice

Resolving the mechanistic and genetic bases of reproductive barriers between species is essential to understanding the evolutionary forces that shape speciation. Intrinsic hybrid incompatibilities are often treated as fixed between species, yet there can be considerable variation in the strength of reproductive isolation between populations. The extent and causes of this variation remain poorly understood in most systems. We investigated the genetic basis of variable hybrid male sterility (HMS) between two recently diverged subspecies of house mice, Mus musculus domesticus and Mus musculus musculus. We found that polymorphic HMS has a surprisingly complex genetic basis, with contributions from at least five autosomal loci segregating between two closely related wild-derived strains of M. m. musculus. One of the HMS-linked regions on chromosome 4 also showed extensive introgression among inbred laboratory strains and transmission ratio distortion (TRD) in hybrid crosses. Using additional crosses and whole genome sequencing of sperm pools, we showed that TRD was limited to hybrid crosses and was not due to differences in sperm motility between M. m. musculus strains. Based on these results, we argue that TRD likely reflects additional incompatibilities that reduce hybrid embryonic viability. In some common inbred strains of mice, selection against deleterious interactions appears to have unexpectedly driven introgression at loci involved in epistatic hybrid incompatibilities. The highly variable genetic basis to F1 hybrid incompatibilities between closely related mouse lineages argues that a thorough dissection of reproductive isolation will require much more extensive sampling of natural variation than has been commonly utilized in mice and other model systems.

Oxaliplatin-induced peripheral neuropathy and identification of unique severity groups in colorectal cancer
Oxaliplatin-induced peripheral neuropathy and identification of unique severity groups in colorectal cancer

Oxaliplatin-induced peripheral neuropathy (OIPN) is a dose-limiting toxicity of oxaliplatin and affects most colorectal cancer patients. OIPN is commonly evaluated by patient symptom report, using scales to reflect impairment. They do not discriminate between unique grouping of symptoms and signs, which impedes prompt identification of OIPN. The objective of this study was to identify clusters of symptoms and signs that differentiated underlying clinical severity and segregated patients within our population into OIPN subgroups. Chemotherapy-naive colorectal cancer patients (N = 148) receiving oxaliplatin were administered the Total Neuropathy Score clinical (TNSc©), which includes symptom report (sensory, motor, autonomic) and sensory examination (pin sense, vibration, reflexes). The TNSc was administered before chemotherapy initiation (T0) and after cumulative doses of oxaliplatin 510–520 mg/m2 (T1) and 1020–1040 mg/m2 of oxaliplatin (T2). Using mean T2 TNSc scores, latent class analysis grouped patients into OIPN severity cohorts. Latent class analysis categorized patients into four distinct OIPN groups: low symptoms and low signs (n = 54); low symptoms and intermediate signs (n = 44); low symptoms and high signs (n = 21); and high symptoms and high signs (n = 29). No differences were noted among OIPN groups on age, sex, chemotherapy regimen, or cumulative oxaliplatin dose. We identified OIPN patient groups with distinct symptoms/signs, demonstrating variability of OIPN presentation regardless of cumulative oxaliplatin dose. Over half of the sample had positive findings on OIPN examination despite little or no symptoms. Sensory examination of all patients receiving oxaliplatin is indicated for timely identification of OIPN, which will allow earlier symptom management.

Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry
Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry

To characterize the extent and impact of ancestry-related biases in precision genomic medicine, we use 642 whole-genome sequences from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) project to evaluate typical filters and databases. We find significant correlations between estimated African ancestry proportions and the number of variants per individual in all variant classification sets but one. The source of these correlations is highlighted in more detail by looking at the interaction between filtering criteria and the ClinVar and Human Gene Mutation databases. ClinVar’s correlation, representing African ancestry-related bias, has changed over time amidst monthly updates, with the most extreme switch happening between March and April of 2014 (r=0.733 to r=−0.683). We identify 68 SNPs as the major drivers of this change in correlation. As long as ancestry-related bias when using these clinical databases is minimally recognized, the genetics community will face challenges with implementation, interpretation and cost-effectiveness when treating minority populations.

The placebo effect: from concepts to genes
The placebo effect: from concepts to genes

Despite its initial treatment as a nuisance variable, the placebo effect is now recognized as a powerful determinant of health across many different diseases and encounters. This is in light of some remarkable findings ranging from demonstrations that the placebo effect significantly modulates the response to active treatments in conditions such as pain, anxiety, Parkinson’s disease, and some surgical procedures. Here, we review pioneering studies and recent advances in behavioral, neurobiological, and genetic influences on the placebo effect. Consistent with recent conceptualizations, the placebo effect is presented as the product of a general expectancy learning mechanism in which verbal, conditioned, and social cues are centrally integrated to change behaviors and outcomes. Examples of the integration of verbal and conditioned cues, such as instructed reversal of placebo effects are also incorporated into this model. We discuss neuroimaging studies that have identified key brain regions and modulatory mechanisms underlying placebo effects using well-established behavioral paradigms. Finally, we present a synthesis of recent genetics studies on the placebo effect, highlighting a promising link between genetic variants in the dopamine, opioid, serotonin, and endocannabinoid pathways and placebo responsiveness. Greater understanding of the behavioral, neurobiological, and genetic influences on the placebo effect is critical for evaluating medical interventions and may allow health professionals to tailor and personalize interventions in order to maximize treatment outcomes in clinical settings.

Effective population size does not predict codon usage bias in mammals
Effective population size does not predict codon usage bias in mammals

Synonymous codons are not used at equal frequency throughout the genome, a phenomenon termed codon usage bias (CUB). It is often assumed that interspecific variation in the intensity of CUB is related to species differences in effective population sizes (Ne), with selection on CUB operating less efficiently in species with small Ne. Here, we specifically ask whether variation in Ne predicts differences in CUB in mammals and report two main findings. First, across 41 mammalian genomes, CUB was not correlated with two indirect proxies of Ne (body mass and generation time), even though there was statistically significant evidence of selection shaping CUB across all species. Interestingly, autosomal genes showed higher codon usage bias compared to X‐linked genes, and high‐recombination genes showed higher codon usage bias compared to low recombination genes, suggesting intraspecific variation in Ne predicts variation in CUB. Second, across six mammalian species with genetic estimates of Ne (human, chimpanzee, rabbit, and three mouse species: Mus musculus, M. domesticus, and M. castaneus), Ne and CUB were weakly and inconsistently correlated. At least in mammals, interspecific divergence in Ne does not strongly predict variation in CUB. One hypothesis is that each species responds to a unique distribution of selection coefficients, confounding any straightforward link between Ne and CUB.