2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE)
11 submissions
| # | Starred | Locked | Notes | Created | User | IP address | First Name | Middle Initial | Last Name | Degree(s) | Position/Title/Career Status | Organization | Abstract Title | Abstract Summary | Operations | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #11 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #11 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #11 | Fri, 06/12/2026 - 09:47 | Anonymous | 10.208.28.22 | Cerdan | G. | Lopez | MGH | Graduate Research Student | University of Melbourne | cerdan.lopez@student.unimelb.edu.au | Addressing discordant classifications of MLH1, MSH2, and MSH6 missense, splice site, and frameshift variants on ClinVar using version 2 of the MMR gene specifications to the ACMG/AMP Criteria | The ClinGen InSiGHT VCEP is charged with maintaining pathogenicity assignment currency of MMR variants on ClinVar. This study aims to re-curate discordant MLH1, MSH2, and MSH6 variants on ClinVar using the MMR-specific modifications to the ACMG/AMP criteria (v2.0), contemporary knowledge, and contact with submitting laboratories. Reasons for change in pathogenicity were analysed. MLH1, MSH2, and MSH6 discordant variants recorded on ClinVar as of June 2026 were selected. Classification change likelihood was maximized by selecting: (A) missense, splice site, and frameshift; (B) documented association with GI cancers; (C) ≥0.81 prior; and (D) ≤0.00002 gnomAD v4.1 allele frequency. Submitting laboratories were contacted for further information. Literature was searched using Mastermind and LitVar2. Reclassification was done using the MMR-specific modifications to the ACMG/AMP criteria (v2.0). For MSH2, Jia et al., calibrated by Scott et al., was used to determine loss of function. For MLH1 and MSH6 the functional assay flowchart and spreadsheet from the modified criteria was used. Thirty-one of 45 MLH1 variants, submitted as P/LP-VUS, were re-classified as VUS and 14 as LP. Just one of 35 B/LB-VUS variants was re-classified LB. Only 1 P/LP-B/LB variant was re-classified as VUS. Two of 51 MSH2 variants, previously submitted as P/LP-VUS, were re-classified as P, 35 as LP, and 14 as VUS. Eighty-one of 84 B/LB-VUS variants were re-classified as VUS, 2 as LP, and 2 as LB. Fifteen of 48 MSH6 variants, submitted as P/LP-VUS, were re-classified as LP and 33 as VUS. Four B/LB-VUS variants were reclassified as VUS. Most submitting laboratories had little new information to add. Tumour microsatellite information was most useful for re-classification. This study provides a framework for systematically addressing discordant variants recorded on ClinVar to assist the ClinGen InSiGHT VCEP’s legacy responsibilities. Lack of updated detailed literature, familial studies, and functional data often limits reclassification. |
|
| 10 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #10 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #10 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #10 | Thu, 06/11/2026 - 23:36 | Anonymous | 10.208.24.28 | Charles | Breeze | Ph.D. | CRTA Postdoc | OEEB/DCEG/NCI | charlesbreeze@hotmail.com | LDscore: a scalable, Python 3-powered web platform for LD score regression analysis | Linkage disequilibrium score regression (LDSC) is an important analytical tool for quantifying heritability and estimating genetic correlations between complex traits. However, the LDSC original implementation relies on an outdated Python 2 framework and deploying the standard command-line tools requires significant setup, data access, and computational expertise, creating a barrier for many researchers. To overcome these limitations, we developed LDscore, a significant technical and accessibility upgraded version of LDSC that allows for rapid analysis of GWAS data. The core advancement is the recoding of the LDSC framework in Python 3, enabling computational optimization and ensuring long-term sustainability. Built on top of this improved foundation, LDscore is implemented as a free, publicly available web application integrated within the popular NCI LDlink framework. LDscore can accelerate scientific research by providing an intuitive graphical interface for heritability estimation, genetic correlation, and LD score calculation, including access to an expanded range of reference populations for online analysis. Notably, our results show that selecting the most appropriate reference population LD panel, even at the subcontinental ancestry group level, is essential for minimizing population stratification bias in heritability estimation. By leveraging cloud computing for superior scalability and eliminating the need for local installation, LDscore adheres to FAIR principles, improving access, traceability, and reproducibility across an expanded set of reference populations, and effectively widens access to researchers worldwide providing support for in-depth genetic analyses. | ||
| 9 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #9 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #9 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #9 | Thu, 06/11/2026 - 23:34 | Anonymous | 10.208.24.28 | Charles | Breeze | Ph.D. | CRTA Postdoc | OEEB/DCEG/NCI | charlesbreeze@hotmail.com | FORGEdb: a tool for identifying candidate functional variants and uncovering target genes and mechanisms for complex diseases | The majority of disease-associated variants identified through genome-wide association studies are located outside of protein-coding regions. Prioritizing candidate regulatory variants and gene targets to identify potential biological mechanisms for further functional experiments can be challenging. To address this challenge, we developed FORGEdb (https://forgedb.cancer.gov/), a standalone and web-based tool that integrates multiple datasets, delivering information on associated regulatory elements, transcription factor binding sites, and target genes for over 37 million variants. FORGEdb scores, now integrated within the NCI LDlink framework, provide researchers with a quantitative assessment of the relative importance of each variant for targeted functional experiments. | ||
| 8 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #8 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #8 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #8 | Thu, 06/11/2026 - 09:20 | Anonymous | 10.208.28.22 | Rajendra | Kumar | PhD | Instructor | Johns Hopkins University | rkumar35@jhmi.edu | Androgen Receptor Drives Polyamine Synthesis, Creating a Vulnerability for Prostate Cancer. | Supraphysiologic androgen (SPA) treatment can paradoxically restrict the growth of castration-resistant prostate cancer (CRPC) with high androgen receptor (AR) activity, which is the basis for the use of bipolar androgen therapy (BAT) for patients with this disease. Although androgens are widely appreciated for enhancing anabolic metabolism, how SPA-mediated metabolic changes alter prostate cancer progression and therapy response is unknown. In this study, we report that SPA markedly increased intracellular and secreted polyamines in prostate cancer models. AR binding at enhancer sites upstream of the ornithine decarboxylase 1 (ODC1) promoter increased the abundance of ODC, a rate-limiting enzyme of polyamine synthesis, and de novo synthesis of polyamines from arginine. SPA-stimulated polyamines enhanced prostate cancer fitness, as dCas9-KRAB–mediated inhibition of AR regulation of ODC1 or direct ODC inhibition by difluoromethylornithine (DFMO) increased the efficacy of SPA. Mechanistically, AR activation, combined with the loss of polyamine-mediated negative feedback, increased S-adenosylmethionine decarboxylase 1 activity, leading to depletion of its substrate, S-adenosylmethionine, and global protein methylation. These data provided the rationale for a clinical trial testing the safety and efficacy of BAT in combination with DFMO for patients with metastatic CRPC. Pharmacodynamic studies in the first five patients in the trial indicated that this therapeutic combination effectively depleted plasma polyamines. Thus, the AR potently stimulates polyamine synthesis, which constitutes a vulnerability in prostate cancer treated with SPA that can be targeted therapeutically. | ||
| 7 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #7 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #7 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #7 | Thu, 06/04/2026 - 03:01 | Anonymous | 10.208.28.22 | Julia | Steinberg | Ph.D | Genomics and Precision Health team lead and Adjunct Associate Professor [early-career faculty] | The Daffodil Centre, The University of Sydney, and Cancer Council NSW, Australia | julia.steinberg@sydney.edu.au | Low-coverage whole-genome-sequencing data for >7,400 Australians integrated with large-scale longitudinal health, socioeconomic, behaviour and linked medical data within the 45 and Up Study | Large-scale studies linking genomic and longitudinal health/medical data are highly valuable for population and cancer epidemiology research. In particular, the 45 and Up Study includes 267,357 participants age 45+ years recruited in 2005-2009, with extensive information on health and sociodemographic characteristics. As part of the Australian Cancer Risk Study, we invited 30,541 participants to provide a DNA sample, including a randomly selected sub-cohort (n=9,986), and all participants diagnosed with prostate, breast, melanoma and colorectal cancer who were alive in October 2021 (identified via linked NSW Cancer Registry data). Overall, 8,311 participants consented (27%), and new genomic data were generated for 7,408 participants using low-coverage whole-genome sequencing (lcWGS; minimum=0.4X, median coverage=0.8X) and genotype imputation (GLIMPSE2, using well-established Gencove analysis pipeline). Following in-depth sample- and variant-level quality checks (QC), we retained a final high-quality genomic dataset of 6,827 participants, including 6,631 unrelated European-ancestry individuals. Supporting quality of post-QC data, we found excellent genotype concordance between lcWGS duplicates (n=85 individuals; r2>0.97), and of imputed lcWGS with additional new dense genotype array data (n=170 individuals; r2>0.9 for minor allele frequency ≥0.01). In an illustrative analysis of leading cancer polygenic risk scores (PGS) for breast, prostate, melanoma, and colorectal cancers, 62-76% of PGS variants passed QC and were generally imputed with high confidence. The risk prediction performance of these PGS in our new data was comparable to previous studies for prostate, breast and melanoma (area-under-the-ROC-curve 0.66-0.68, 0.62, 0.64, respectively), but slightly reduced for colorectal cancer (~0.57 vs ~0.62). No PGS was significantly associated with cancer spread at diagnosis, and only prostate cancer PGS were significantly associated with younger age at diagnosis. In conclusion, we present a major new high-quality genomics resource generated using low-coverage whole-genome-sequencing, readily integrated with longitudinal linked health data from the 45 and Up Study to support population and cancer epidemiology research. |
||
| 6 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #6 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #6 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #6 | Wed, 06/03/2026 - 13:04 | Anonymous | 10.208.28.77 | Taylor | Head | Ph.D., MSPH | Postdoctoral Fellow | University of Texas MD Anderson Cancer Center | sthead@mdanderson.org | Leveraging long-read RNA-seq for isoform-level regulatory discovery and TWAS fine-mapping in breast cancer | Most expression quantitative trait locus (eQTL) and transcriptome-wide association study (TWAS) analyses rely on large, tissue-agnostic transcript annotations. These overlook tissue-specific isoform usage and can obscure the true regulatory mechanisms underlying disease-associated loci. Through long-read (LR) RNA-sequencing, we can directly observe full-length transcripts and define tissue-relevant isoforms. However, direct integration of de novo long-read annotations into genetic studies remains challenging due to limited sample sizes with incomplete transcript capture. To address this, we first evaluated how transcript annotation influences regulatory discovery in breast cancer. We quantified gene- and isoform-level expression in breast tumor (TCGA), healthy breast tissue, and cultured fibroblasts using standard GENCODE annotations, tissue-specific LR-derived annotations, and combined annotations. Across tissues, most eGenes were concordant between annotations, but approximately one-third of lead cis-eQTLs for shared eGenes differed. Isoform-level regulatory discovery was substantially more annotation-dependent: in healthy breast tissue, 46% of eIsoforms identified using LR-informed annotations were unique despite 93.7% being present in GENCODE. Although combined annotations expanded the transcript catalog by only 0.6–7.6%, 69% of significant isoform-trait associations were specific to a single annotation. These analyses uncovered candidate regulatory isoforms at established breast cancer risk loci that were missed using conventional transcriptome annotations. Motivated by this work, we next developed a Bayesian isoform fine-mapping framework using isoform-specific priors derived from LR evidence while accounting for structural similarity among transcripts. The model prioritizes tissue-relevant isoforms and reduces spurious prioritization of structurally redundant transcripts. In simulations, the approach improves power while maintaining appropriate type I error relative to existing TWAS fine-mapping methods. Together, these results demonstrate how LR data can be incorporated into regulatory mapping and TWAS fine-mapping to improve prioritization of candidate causal isoforms underlying disease-associated loci. | ||
| 5 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #5 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #5 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #5 | Mon, 06/01/2026 - 20:45 | Anonymous | 10.208.28.77 | Mykhaylo | M. | Malakhov | Ph.D. | Postdoctoral Scholar | Stanford University School of Medicine | mykmal@stanford.edu | Accounting for selected genetic regulators of prostate-specific antigen levels enhances prostate cancer prediction | Elevation of prostate-specific antigen (PSA) can be indicative of prostate cancer, but its use for population-based screening remains controversial. The same PSA level can have different interpretations depending on genetic predisposition, potentially leading to missed cases or overdiagnosis. Here we present an approach for obtaining genetically informed PSA values and validate it in the FinnGen biobank. Unlike previous methods that remove the component of PSA captured by a genome-wide polygenic score (PGS), we explore PGS partitioning to provide a more precise correction with fewer off-target genetic effects on prostate cancer risk. We first found that although the cis region explains only 20% as much variation in PSA as a genome-wide PGS, PSA adjusted by the cis-region PGS predicted prostate cancer nearly as effectively as with genome-wide PGS adjustment (OR=5.92, P=1.09e-444, AUC=0.840 vs. OR=6.07, P=3.52e-453, AUC=0.837). Both models improved upon observed pre-diagnostic PSA (OR=5.83, P=3.05e-440, AUC=0.838). Next, we partitioned the PGS into linkage disequilibrium blocks to identify trans loci that enhance the accuracy of genetic adjustment in the Prostate Cancer Prevention Trial (PCPT) and the Selenium and Vitamin E Cancer Prevention Trial (SELECT). Applying the resulting filtered PSA score to FinnGen yielded an even stronger association with prostate cancer than genome-wide adjustment, particularly in men with PSA ≤ 4 ng/ml (OR=6.07, P=5.42e-182, AUC=0.783 vs. OR=5.87, P=3.35e-179, AUC=0.779). This work demonstrates the importance of carefully selecting genetic signals when recalibrating noncausal disease biomarkers. | |
| 4 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #4 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #4 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #4 | Fri, 05/22/2026 - 13:47 | Anonymous | 10.208.28.95 | Ryan | L | Collins | PhD | Instructor | Dana-Farber Cancer Institute | ryan_collins@dfci.harvard.edu | Diverse mediators of cancer predisposition uncovered by germline whole genome sequencing of unexplained familial cancers | Cancer frequently clusters in families due to shared environment and genetics. However, many familial cancer cases lack a clinically recognized pathogenic germline variant (PGV). We analyzed germline genomes and family history from 2,726 individuals without a PGV in the All of Us Research Program, including 1,496 cases across 18 cancer types with extensive family history and 1,230 family history-negative, cancer-free controls. We identified allelic series of rare structural variants inactivating MSH2 in individuals with phenotypes consistent with Lynch syndrome and BRCA1 in breast cancer. Cancer polygenic risk scores were enriched in cases and correlated with patterns of cancer diagnoses within families. Exome-wide rare variant analyses nominated six candidate predisposition genes, including TSTD2 and BRAT1 in thyroid and breast cancer, respectively. Overall, polygenic risk and rare variants impacting known genes explained a median of 5% of unexplained familial cancers, increasing to 11% when including newly nominated risk factors. | |
| 3 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #3 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #3 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #3 | Thu, 05/21/2026 - 10:15 | Anonymous | 10.208.28.250 | Tony | Chen | PhD | Postdoctoral Research Fellow | Massachusetts General Hospital | chentony@broadinstitute.org | SPLENDID incorporates continuous genetic ancestry in biobank-scale data to improve polygenic risk prediction across diverse populations | Polygenic risk scores are widely used in disease risk stratification, but their accuracy varies across different ancestries. Recent methods leverage multi-ancestry data to improve accuracy in under-represented populations but require labelling individuals by ancestry. This poses practical challenges, as clinical decisions are typically not based on ancestry, and many individuals may not fit into a pre-specified ancestry group. We propose SPLENDID, a penalized regression framework for large-scale individual-level data that models genetic ancestry as a continuum to produce a unified prediction model without any ancestry labels. In extensive simulations and analysis in the All of Us Research Program (N=224,364) and UK Biobank (N=340,140), we show that SPLENDID significantly improved prediction accuracy over existing methods, particularly in non-European and admixed ancestries. By modeling genetic interactions with continuous ancestry, we further identified ancestry-differential effects in lipid and blood cell phenotypes that may explain limited transferability of existing PRS methods across ancestry groups. Finally, using a logistic regression extension of SPLENDID improved prediction of breast and prostate cancer by 6% and 9%, respectively, compared to current state-of-the-art PRS. Altogether, SPLENDID stands as a valuable tool for robust risk prediction across diverse populations, reduced health disparities in genetic research, and fairer clinical implementation. | ||
| 2 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #2 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #2 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #2 | Tue, 05/19/2026 - 12:27 | Anonymous | 10.208.24.240 | Haoyu | Zhang | Ph.D. | Earl Stadtman Tenure-Track Investigator | National Cancer Institute | haoyu.zhang2@nih.gov | Integrating Common and Rare Variants to Improve Genetic Risk Prediction Across Diverse Populations | Background: Polygenic risk scores (PRSs) are increasingly used to stratify risk in population and cancer epidemiology, but most rely on common variants and may miss rare sequencing variants that have large effects in a subset of individuals. We developed RICE (polygenic Risk predictions Integrating Common and rarE variants), a framework that combines common- and rare-variant information for more inclusive genetic risk prediction. Methods: RICE builds a common-variant PRS by ensembling leading PRS methods and builds a rare-variant PRS by testing functionally annotated gene-level variant sets, collapsing significant sets into burden scores, and combining them with penalized regression. We evaluated RICE in simulations and in UK Biobank and All of Us sequencing data, including up to 740 million variants from 361,939 unrelated participants across African, Admixed American/Latino, European, Middle Eastern, and South Asian ancestries. Analyses covered 11 traits, including lipid levels, height, body mass index, breast cancer, coronary artery disease, and type 2 diabetes. Results: In simulations, RICE detected rare-variant signals and improved prediction across ancestries. In real data, the common-variant component consistently matched or outperformed leading PRS methods. Adding rare variants yielded the clearest gains for traits with established rare-variant architecture, especially lipid traits and height. For lipid traits, incorporating rare variants increased explained variance by up to 11.2% in Europeans and 60.7% in African ancestry compared with common-variant PRS alone. Rare-variant scores also identified individuals with extreme lipid profiles who would be missed by common-variant PRS alone, and genome-wide rare-variant modeling outperformed scores restricted to established high-penetrance lipid genes. Conclusions: Sequencing-informed rare variants can add meaningful, ancestry-relevant information to PRS models, but benefits depend on trait architecture and sample size. RICE provides an open-source framework for integrating common and rare variation in large epidemiologic sequencing studies. |
||
| 1 | Star/flag 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #1 | Lock 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #1 | Add notes to 2026 Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) : Submission #1 | Mon, 03/23/2026 - 13:51 | Anonymous | 10.208.28.96 | Gustavo | A | Mendoza Fandino | Ph.D. | Postdoctoral Fellow | Monteiros'Lab/Moffitt Cancer Center | gustavo.menndoza-fandino@moffitt.org | Elucidating the Molecular Basis of Testicular Cancer Susceptibility Using Integrated GWAS, TWAS, and Functional Genomic Annotation | Testicular germ cell tumor (TGCT) is the most common cancer in young adult individuals and exhibits one of the highest heritability estimates among solid tumors. Familial aggregation studies consistently indicate a strong genetic component to TGCT susceptibility, with risk pathways enriched for cell‑cycle regulation, chromosome segregation, and DNA repair mechanisms. Although genome‑wide association studies (GWAS) and transcriptome‑wide association studies (TWAS) have identified numerous germline risk loci, the underlying biological mechanisms and causal variants remain poorly defined. Methods: We implemented an integrative analytic framework to functionally annotate TGCT risk regions identified through GWAS and TWAS. For each region, we defined a set of credible variants using LD ≥0.8 with the lead SNP. These variants were evaluated using histone‑mark profiles, chromatin accessibility, regulatory element predictions, and long‑range chromatin interaction datasets (Hi‑C) to assess potential enhancer or promoter activity. This enhancer‑focused screen was applied uniformly across all loci; however, each region was additionally examined for alternative mechanisms, including post‑transcriptional regulation. Results: The chromosome 7 locus illustrates how this approach resolves locus‑specific molecular mechanisms. TWAS and GWAS jointly prioritized SP4 as a candidate gene. Within the credible set, we identified rs7798894, located in the SP4 3′UTR, as the most plausible functional variant. rs7798894 alters a predicted binding site for hsa‑miR‑4282, a microRNA reported to exhibit tumor‑suppressive activity. The T allele (frequency ~0.72) creates a functional miRNA seed site, whereas the A allele (frequency ~0.28) disrupts it, suggesting allele‑specific SP4 regulation. Although enhancer annotations were surveyed, the miRNA‑mediated mechanism provided the strongest functional explanation for this locus. Conclusions: Our integrative GWAS–TWAS framework enables locus‑specific mechanistic inference and highlights post‑transcriptional miRNA targeting as a driver of risk at the chromosome 7 SP4 locus. This approach improves causal variant identification, informs biologically grounded polygenic risk score development, and advances mechanistic understanding of TGCT susceptibility. |