NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #21
Submission information
Submission Number: 21
Submission ID: 145111
Submission UUID: a178e474-6c41-4f0c-a629-9379cf726847
Submission URI: /nci/ods-data-jamboree/abstractsubmissions
Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=P524a62c1nz7c2TZtPFWkJUvRGOZgEEJ8znyXFzN2Gk
Created: Sun, 06/22/2025 - 15:47
Completed: Sun, 06/22/2025 - 15:47
Changed: Sun, 06/22/2025 - 15:47
Remote IP address: 10.208.24.48
Submitted by: Anonymous
Language: English
Is draft: No
Presenter Information
Yang
E
Li
Ph.D.
Assistant Professor
Washington University School of Medicine
St. Louis
{Empty}
Abstract Information
Development or refinement of analysis pipelines or AI/ML algorithms
Pediatric high-grade gliomas, genetic variants, single-cell
Annotation and Interpretation of Genetic Risk Variants in Pediatric Brain Tumors at Cell Type/State Resolution
Pediatric high-grade gliomas (pHGG) comprise a deadly, heterogeneous category of pediatric gliomas with limited treatment options. The pHGGs harbor unique molecular features, including the global changes in histone modification profiles in pHGG, which, combined with other genetic risk factors, act as drivers for pHGG tumorigenesis. Previous studies have detailed and characterized the function of genetic variants in coding regions, such as mutations in the histone H3 gene, histone modifiers, and oncogenes, which have led to precise tumor classifications. However, less attention has been paid to ~95% of non-coding genetic variants, and it is estimated that ~80% of disease risk variants reside in non-coding cis-regulatory elements (CREs).
Recent advances in single-cell technologies have been adopted by consortia, like NIH's BRAIN Initiative, Human Cell Atlas, to study spatial-temporal gene regulatory programs and have resulted in cell atlases in multicellular organisms. These technologies capture the genomic signals, including DNA methylation, chromatin accessibility, histone modifications, 3D genome conformation, and spatial information, either alone or in combination with snRNA-seq. I made key contributions to identify thousands of distinct cell types from >3 million individual cell nuclei in both human and mouse brains by integrating various single-cell multimodal omics. In addition, the success of the advanced AI/ML models, including Epiformer, which I developed and trained on genomic data, helps interpret genetic variants linked to various human disorders. These achievements offer great opportunities for interpreting risk variants of pHGG at a more refined cell type/state resolution.
I propose to leverage valuable data resources, such as Kids First Program, NCI's TARGET, and CCDI, and apply our well-established computational pipelines/tools to (1) identify high-frequency genetic risk variants from whole genome sequencing data from pHGGs; (2) annotate genetic risk variants to functional CREs in a cell type/state-specific manner; and (3) interpret their potential function by associating them with clinical information.
Recent advances in single-cell technologies have been adopted by consortia, like NIH's BRAIN Initiative, Human Cell Atlas, to study spatial-temporal gene regulatory programs and have resulted in cell atlases in multicellular organisms. These technologies capture the genomic signals, including DNA methylation, chromatin accessibility, histone modifications, 3D genome conformation, and spatial information, either alone or in combination with snRNA-seq. I made key contributions to identify thousands of distinct cell types from >3 million individual cell nuclei in both human and mouse brains by integrating various single-cell multimodal omics. In addition, the success of the advanced AI/ML models, including Epiformer, which I developed and trained on genomic data, helps interpret genetic variants linked to various human disorders. These achievements offer great opportunities for interpreting risk variants of pHGG at a more refined cell type/state resolution.
I propose to leverage valuable data resources, such as Kids First Program, NCI's TARGET, and CCDI, and apply our well-established computational pipelines/tools to (1) identify high-frequency genetic risk variants from whole genome sequencing data from pHGGs; (2) annotate genetic risk variants to functional CREs in a cell type/state-specific manner; and (3) interpret their potential function by associating them with clinical information.
{Empty}