NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)
50 submissions
# | Starred | Locked | Notes | Created | User | IP address | First Name | Middle Initial | Last Name | Degree(s) | Position/Title/Career Status | Organization | Organization Address | Other (Please Specify) | Abstract Category | Abstract Keywords | Abstract Title | Abstract Summary | Upload Abstract | Operations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #30 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #30 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #30 | Mon, 06/23/2025 - 17:29 | Anonymous | 10.208.28.103 | Kate | Isaac | Ph.D. | Post Doctoral Fellow | Fred Hutch Cancer Center | Seattle | kisaac@fredhutch.org | Development of tutorial and educational tools, data storytelling, infographics, and other creative uses of data | Interested in joining a team to develop tutorial and educational tools with childhood cancer data | I am looking for a project team to join. I am a post-doc at the Fred Hutch Cancer Center (https://kweav.github.io/) where much of my work is devoted to creating training materials for scientists and trainees to make informatics more accessible. I contribute to major training projects including the Informatics Technology for Cancer Research (ITCR) Training Network (ITN, https://www.itcrtraining.org), Open Case Studies, https://www.opencasestudies.org/) and the AnVIL project (https://anvilproject.org). In this capacity, I have supported or led around 25 workshops with interactive activities about working with cancer data; in addition, I have assisted in editing courses on reproducibility practices and choosing appropriate genomics tools, have provided feedback to others developing training materials for the cancer research community, and am developing course materials for trainees on effective data visualization practices and considerations when working with clinical cancer data. I am co-advised by Jeff Leek, Carrie Wright, and Ava Hoffman – all respected experts with years of experience in developing and delivering effective training. My background is both in wet lab research as well as mathematics/computational biology. My graduate work was in a functional genomics and population genetics lab, and in grad school I was extensively involved in creating educational materials for those new to programming or algorithmic thinking and connecting such concepts to biology research. I can offer experience in thinking across fields of study and developing effective training materials, particularly for online courses or in person workshop settings. I am also involved in the development and maintenance of tools related to training material dissemination (https://www.ottrproject.org/). | |||||
29 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #29 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #29 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #29 | Mon, 06/23/2025 - 17:04 | Anonymous | 10.208.24.253 | Yin | Lu | Ph.D. | Lead Bioinformatics Analyst | ICF | Rockville | yin.lu@icf.com | Methods to enable data interoperability | Pediatric cancer, AI-readiness, Cohort refinement, Biomedical data quality | CC-CARE-AI: A Framework for Assessing and Refining AI-Readiness of Childhood Cancer Cohorts from Kids First, TARGET, and CCDI | While large-scale childhood cancer datasets are increasingly available, researchers often struggle to determine which cohorts are suitable for AI and machine learning applications due to inconsistencies in data quality, completeness, and standardization. This project proposes a modular framework CC-CARE-AI (Childhood Cancer Cohort Assessment and REfinement for AI) to assess and refine the AI-readiness of childhood cancer cohorts from the Gabriella Miller Kids First Program (Kids First), TARGET, and the Childhood Cancer Data Initiative (CCDI). CC-CARE-AI generates domain-specific readiness scores across clinical, genomic, and imaging data using a transparent, multi-criteria evaluation system. To complement the framework, it also incorporates tools for cohort refinement and decision support through interactive visualizations and dashboards, enabling researchers to identify high-quality subsets and enhance data usability. By aligning data quality with specific research and machine learning needs, the framework facilitates more effective and responsible use of AI in pediatric oncology. The project will use Python, R, and tools like pandas and Streamlit, etc. on the Seven Bridges Cancer Genomics Cloud for secure, scalable, and reproducible analysis. The project team will be lead by Dr. Yin Lu (Lead Bioinformatics Analyst) and includes Mr. Alexander Pilozzi (Bioinformatics Analyst), and Dr. Alejandro M. Sevillano (Bioinformatics Analyst) from the Health Analytics and Research Technologies division at ICF, with expertise in cancer data management, AI-readiness assessment, and cloud-based analysis. The team brings relevant experience from the CPTAC program, NIDDK Data Centric Challenge, ARPA-H Biomedical Data Fabric, and CRDC integration efforts. |
Abstract_Data_Jaboree.docx22.84 KB
|
|||
28 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #28 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #28 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #28 | Mon, 06/23/2025 - 16:40 | Anonymous | 10.208.24.253 | Rachel | D. | Harris | Ph.D. | Clinical Research Scientist | St. Jude Children's Research Hospital | Memphis, TN | Rachel.Harris2@StJude.org | Employment of statistical methods or existing computational, mathematical, or informatics tools | Rhabdomyosarcoma, Pharmacogenomics, Neuroblastoma, Toxicities, Precision Medicine | Investigating UGT1A1 Variants and Irinotecan Toxicities in Pediatric Cancer: Insights from CCDI and COG Trials | In adults, pharmacogenomic prescribing guidelines for irinotecan are based on actionable UGT1A1 genotypes, with dosing and labeling informed by pharmacogenomic data. UGT1A1*28 and UGT1A1*6 are well-established markers affecting irinotecan metabolism with implications for pharmacokinetics and toxicity. Patients homozygous for the UGT1A1*28 allele likely require irinotecan dose reductions, as supported by strong evidence in adults and FDA guidelines. UGT1A1 and irinotecan are designated as a level A gene-drug pair by CPIC and have a Pharmacogenomics Knowledge Base (PharmGKB) 1A evidence level, indicating potential genotype-related toxicity. However, evidence for these associations in children is limited, with existing studies yielding mixed results. There is a critical need to better understand UGT1A1 genotypes in the pediatric population, particularly in the context of treating rare pediatric cancers such as rhabdomyosarcoma and neuroblastoma. We propose leveraging data from Children’s Oncology Group (COG) clinical trials and the Childhood Cancer Data Initiative (CCDI) Molecular Characterization Initiative (MCI) to investigate irinotecan-related toxicities and pharmacogenomic markers in children with cancer. In the ARST1431 trial, diarrhea was a reportable adverse event, with grade 3+ toxicity occurring in 14% of participants (n=297) despite protocol-directed anti-diarrhea care. Notably, our recent analysis of the CCDI cohort revealed that 92% of participants carried at least one actionable phenotype. Our objective is to better understand treatment-related toxicities and identify actionable genotypes to improve therapy for children with cancer. To meet this objective, analyses will include two major aims: (1) quantify the frequency of clinically relevant UGT1A1 pharmacogenomic genotypes; and (2) assess associations between UGT1A1 genotypes and irinotecan-related diarrhea. Variants will be identified from PharmGKB to include UGT1A1 exonic variants with a level 3 or higher clinical annotation with irinotecan. Computational tools we will leverage include Plink, PharmCAT, and R. Our proposed team includes Brooke Bernhardt, Pharm.D., Rachel Harris, Ph.D., Wenjian Yang, Ph.D., and Philip Lupo, Ph.D. |
|||
27 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #27 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #27 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #27 | Mon, 06/23/2025 - 16:32 | Anonymous | 10.208.24.253 | Michael | Leung | PhD | Postdoctoral Research Fellow | Harvard T.H. Chan School of Public Health | Boston | mleung@hsph.harvard.edu | Building specific disease cohorts, or visualization techniques | Investigating prenatal environmental stressors and pediatric cancer risk using linked birth-cancer registry data in New Jersey | Pediatric cancer is the leading cause of disease-related deaths in children. Its environmental origins are not well understood, but there is emerging evidence that points to the etiologic importance of several environmental exposures, such as air pollution and extreme temperature. However, only a few studies have systematically examined the relationship between prenatal environmental stressors and pediatric cancer, largely due to limitations in linked longitudinal datasets. Using a novel, individual-level dataset linking birth certificates to the cancer registry in New Jersey, we plan to examine associations between climate-related exposures and pediatric cancer incidence. Our project aims to: 1. Characterize prenatal exposures to environmental stressors (e.g., air pollution, background radiation etc.) using geocoded residential addresses and publicly available environmental datasets. 2. Link these exposures to cancer outcomes (e.g., acute lymphoblastic leukemia) using the linked birth-cancer registry data. 3. Use flexible statistical models, including distributed lag and spatiotemporal survival models, to evaluate exposure–outcome associations, accounting for key confounders. Our data is housed at Rutgers University and is protected by a data use agreement, and so it is not publicly available to share. However, we can share a simulated example for the Data Jamboree. We also have not yet assembled a team for the Data Jamboree. If our proposed dataset does not work, I am eager to join a team conducting work on building cohorts using registry data, or the employment of statistical methods used to analyze exposure-outcome associations in pediatric cancer epidemiology. This project aligns with the goals of the Data Jamboree to encourage the reuse of high-value cancer datasets and foster interdisciplinary collaboration. I bring expertise in epidemiology, environmental health, and big data analysis and am excited to lead or contribute to a team working on pediatric cancer, environmental health and/or big data. |
|||||
26 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #26 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #26 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #26 | Mon, 06/23/2025 - 15:25 | Anonymous | 10.208.28.103 | Marcin | Cieslik | Ph.D | Assistant Professor | University of Michigan | Ann Arbor | mcieslik@med.umich.edu | Building specific disease cohorts, or visualization techniques | MI-OncoSeq, immunotherapy, integrative genomics, precision oncology | Integrative Genomic Analysis of Pediatric Cancer Data from Peds-MI-OncoSeq within the NCI Childhood Cancer Data Initiative (CCDI) | Childhood cancers are rare and genomically diverse, and a lack of centralized data has hindered research progress. This project will leverage the rich genomic and transcriptomic data from the Pediatric Michigan Oncology Sequencing (Peds-MI-OncoSeq) cohort, a key contribution to the NCI's Childhood Cancer Data Initiative (CCDI). We propose a dual-aim study to first, create a comprehensive multi-omic atlas to define the molecular architecture across a range of pediatric malignancies. Second, we will conduct a focused analysis on the neuroblastoma cohort to identify robust molecular biomarkers that predict patient response to anti-GD2 immunotherapy, a standard-of-care treatment. By integrating genomic and transcriptomic data, this work seeks to uncover fundamental cancer biology and produce clinically relevant predictors to improve therapeutic strategies for children with cancer. |
CCDI_proposal.docx5.91 MB
|
|||
25 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #25 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #25 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #25 | Mon, 06/23/2025 - 14:27 | Anonymous | 10.208.24.253 | Rawan | Shraim | Ph.D. | Bioinformatics Scientist | Children's Hospital of Philadelphia | Philadelphia | shraimr@chop.edu | Data integration | Proteomics, RNA, protein-RNA correlations, data integration | Deciphering RNA-Protein Relationships Across Cancer and Healthy Tissues | While prior studies have explored correlations between proteomics and RNA sequencing (RNA-seq) data, significant gaps remain in understanding the biological characteristics of correlating proteins and the mechanisms underlying these relationships. A deeper understanding of transcriptomic-proteomic correlations is critical for improving multi-omic data interpretation and for maximizing the utility of both transcriptomic and proteomic datasets. Comprehensive analyses of these relationships have historically been limited by sparse proteomic coverage and the scarcity of datasets with matched proteomic and transcriptomic data. However, recent advances in mass spectrometry and the generation of large-scale, multi-omic datasets now enable more detailed investigation. Our objective in this study is to systematically characterize RNA-protein correlations across multiple datasets, assess how these correlations vary based on protein subcellular localization, protein function and cancer phenotype, and determine whether patterns differ across hematologic malignancies, solid tumors, and healthy tissues. Beyond global proteomic and transcriptomic comparisons, we are also interested in leveraging the rich multi-omic features now available in these datasets to better understand the variance between RNA and protein levels —including, but not limited to, phosphoproteomics, metabolomics, glycosylation, methylation, and other post-translational modifications—to further dissect the molecular features that contribute to concordant and discordant RNA-protein relationships. All analyses will be conducted in R/Rstudio and datasets that are accessible have been downloaded/access has been requested to those that are not automatically available through their publications. This work aims to advance our understanding of protein regulation in cancer and normal tissues and to inform best practices for integrative analysis of multi-omic datasets. Insights gained could support biomarker discovery, improve our interpretation of transcriptomics-only studies, and guide the development of therapeutic targets based on protein-level dysregulation. |
NCIDataJamboree_06.20.2025.docx19.22 KB
|
|||
24 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #24 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #24 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #24 | Mon, 06/23/2025 - 12:12 | Anonymous | 10.208.28.103 | Fernanda | A. | Silva Michels | MSc, PhD, ODS-C | Epidemiologist - Program Manager of Data Quality and Integration | NAACCR | Springfield | fmichels@naaccr.org | Building specific disease cohorts, or visualization techniques | COG, NCCR Data Platform, Neuroblastoma, Survival, Trials enrollment. | Survival Disparities among Children Diagnosed with Neuroblastoma Enrolled and Not Enrolled in COG Clinical Trials Using the NCCR Data Platform | Group: Fernanda Silva Michels (NAACCR), Gonçalo Forjaz (Westat), Stephanie Hill (NAACCR) Background A 2025 Children’s Oncology Group (COG) study1 found that Black and Hispanic children with high-risk neuroblastoma had worse overall survival (OS), even when treated with the same standardized protocols on frontline COG clinical trials. The mechanisms explored in the study did not fully account for the observed disparities in survival. With the NCCR Data Platform, we now have the opportunity to examine whether similar patterns exist among children who did not participate in a COG clinical trial. Methods The NCCR Data Platform links Surveillance, Epidemiology, and End Results (SEER) population-based cancer registry data from 1995 to 2021, representing 57.6% of all US children, adolescents and young adults, and data from patients enrolled in COG studies from 2007 to 2018, including clinical trials and registry protocols. For this study, we will use SEER population-based cancer registry data from 2007 to 2018 to align with the available COG data. Survival rates will be estimated using Kaplan-Meier analysis. Univariate analysis will be performed with the log-rank test, and multivariate analysis will utilize Cox proportional hazards regression to identify factors associated with overall survival (OS). The analysis will be performed using SEER*Stat and RStudio. Results/Conclusion This analysis will help determine whether racial and ethnic disparities in survival among children with neuroblastoma extend beyond the clinical trial population. By comparing outcomes in trial and non-trial settings, we aim to better understand the role of broader contextual and systemic factors. Results may inform future strategies to improve equity in pediatric oncology outcomes. This study will also highlight the depth and utility of the NCCR Data Platform as a resource for population-based cancer research. |
|||
23 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #23 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #23 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #23 | Mon, 06/23/2025 - 10:14 | Anonymous | 10.208.28.103 | Elmer | Andrés | Fernández | Ph.D | Head of Health Data Science Laboratory and Biomedical Engineer | Fundacion para el Progreso de la Medicina | Córdoba, Argentina | elmer.fernandez@unc.edu.ar | daniorschanski@mi.unc.edu.ar | Development or refinement of analysis pipelines or AI/ML algorithms | Gene Fusions, Knowledgebase, Pediatric Oncology | Fusion.AR-DB: A Collaborative Knowledgebase of Actionable Gene Fusions in Pediatric Cancers Leveraging Large-Scale RNA-Seq Data | Gene fusions are powerful oncogenic drivers and biomarkers in pediatric and AYA (adolescent and young adult) cancers, yet their clinical utility remains underexploited due to fragmented datasets, limited detection approaches, and the lack of centralized, clinically meaningful resources. We propose Fusion.AR-DB, the first open-access, comprehensive knowledge base of gene fusions in pediatric and AYA cancers. The objective is to systematically detect, annotate, and organize gene fusions across publicly available RNA-Seq datasets, including NIH Kids First, TARGET, CCDI, and others. Fusion.AR-DB will catalog all detected fusions, not only known or actionable ones, and associate them with clinical data like tumor types, molecular subtypes, and expression profiles. Therapeutically actionable events will be linked to FDA/EMA-approved drugs and clinical trials. Each fusion will be annotated with functional and structural insights, including domain-level information, expression impact, and structure of the resulting chimeric proteins, enabling downstream modeling and docking. Built with a validated, high-performance pipeline optimized for low-input pediatric samples, the platform integrates tools such as STAR and Arriba. All analyses will be run in a high-performance computing environment. The results will be delivered via a user-friendly, interactive interface, allowing users to explore fusion prevalence, co-occurrence, functional relevance, and therapeutic potential. Preliminary analyses reveal: - Undetected kinase fusions in high-risk pediatric cancers. - Novel recurrent fusion actionable events. - Actionable fusions in ~7% of FISH-negative tumors. Impact: Fusion.AR-DB increases therapeutic eligibility up to 3-fold in pilot cohorts, reduces diagnostic costs by over $1.2M/year per institution, and directly supports NIH goals by transforming RNA-Seq data into clinically actionable, personalized insights. Project Team: - Ph.D. Elmer Fernández, Principal Investigator - Guadalupe Nibeyro, Biochemist and PhD Candidate - Daniela Orschanski, Biomedical Engineer and PhD Candidate Affiliated with the Fundación para el Progreso de la Medicina and CONICET, Córdoba, Argentina. |
||
22 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #22 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #22 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #22 | Mon, 06/23/2025 - 10:06 | Anonymous | 10.208.28.103 | Yanling | Sun | Ph.D | Postdoc | Bing Zhang lab/ Baylor Colledge of Medicine | houston | yanling.sun@bcm.edu | sunyanling312@gmail.com | Employment of statistical methods or existing computational, mathematical, or informatics tools | Expanding LinkedOmics for comprehensive integration of pediatric cancer omics data | Pediatric cancer remains a leading cause of disease-related mortality in children. Owing to its distinct molecular underpinnings compared to adult cancers, dedicated research is essential to improve diagnostic precision and uncover novel biomarkers and therapeutic targets. Meanwhile, comprehensive molecular landscapes derived from adult cancers can provide valuable guidance in identifying key molecular features and regulatory mechanisms in pediatric tumors. LinkedOmics is the first publicly accessible multi-omics web platform that integrates mass spectrometry (MS)-based proteomics with genomics, transcriptomics, metabolomics, and lipidomics data, all accompanied by comprehensive clinical annotations. The current release includes 60 adult cancer datasets (including four controlled-access datasets), comprising over 17,000 samples. The tool is openly and freely accessible at https://www.linkedomics.org, which has been cited in over 2000 publications. To extend its utility to pediatric oncology, we have integrated a pediatric brain tumor dataset comprising 223 samples as the initial entry point. The platform enables correlation analysis between clinical variables (e.g., age, sex, race, subtype, survival) and multi-omics features to identify clinically relevant molecular signatures. The association analysis between different omics data is also supported to reveal potential regulatory interactions, followed by functional interpretation through over-representation analysis (ORA) or gene set enrichment analysis (GSEA). Comparative analysis across pediatric and adult datasets further facilitates the exploration of shared and distinct biological elements. LinkedOmics is deployed on AWS cloud infrastructure, providing high scalability, robust data security, and responsive computational performance. The platform supports secure controlled access for unpublished pediatric datasets and offers seamless analysis of publicly available data, enabling flexible data sharing from early discovery to publication. At the jamboree, we will use the pediatric brain tumor dataset as an example to demonstrate feasibility through both predefined and participant-driven case studies, encourage adoption by the pediatric cancer research community, and identify pediatric-specific needs to guide future platform development. | ||||
21 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #21 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #21 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #21 | Sun, 06/22/2025 - 15:47 | Anonymous | 10.208.24.48 | Yang | E | Li | Ph.D. | Assistant Professor | Washington University School of Medicine | St. Louis | yeli@wustl.edu | Development or refinement of analysis pipelines or AI/ML algorithms | Pediatric high-grade gliomas, genetic variants, single-cell | Annotation and Interpretation of Genetic Risk Variants in Pediatric Brain Tumors at Cell Type/State Resolution | Pediatric high-grade gliomas (pHGG) comprise a deadly, heterogeneous category of pediatric gliomas with limited treatment options. The pHGGs harbor unique molecular features, including the global changes in histone modification profiles in pHGG, which, combined with other genetic risk factors, act as drivers for pHGG tumorigenesis. Previous studies have detailed and characterized the function of genetic variants in coding regions, such as mutations in the histone H3 gene, histone modifiers, and oncogenes, which have led to precise tumor classifications. However, less attention has been paid to ~95% of non-coding genetic variants, and it is estimated that ~80% of disease risk variants reside in non-coding cis-regulatory elements (CREs). Recent advances in single-cell technologies have been adopted by consortia, like NIH's BRAIN Initiative, Human Cell Atlas, to study spatial-temporal gene regulatory programs and have resulted in cell atlases in multicellular organisms. These technologies capture the genomic signals, including DNA methylation, chromatin accessibility, histone modifications, 3D genome conformation, and spatial information, either alone or in combination with snRNA-seq. I made key contributions to identify thousands of distinct cell types from >3 million individual cell nuclei in both human and mouse brains by integrating various single-cell multimodal omics. In addition, the success of the advanced AI/ML models, including Epiformer, which I developed and trained on genomic data, helps interpret genetic variants linked to various human disorders. These achievements offer great opportunities for interpreting risk variants of pHGG at a more refined cell type/state resolution. I propose to leverage valuable data resources, such as Kids First Program, NCI's TARGET, and CCDI, and apply our well-established computational pipelines/tools to (1) identify high-frequency genetic risk variants from whole genome sequencing data from pHGGs; (2) annotate genetic risk variants to functional CREs in a cell type/state-specific manner; and (3) interpret their potential function by associating them with clinical information. |
|||
20 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #20 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #20 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #20 | Sun, 06/22/2025 - 10:31 | Anonymous | 10.208.24.48 | Tung-Shing | M | Lih | Ph.D. | Research Associate | Johns Hopkins University | Baltimore | tlih1@jhmi.edu | Proteogenomic data analysis | Ependymoma, proteogenomics, protein modifications, mass spectrometry | Proteogenomic characterization of childhood ependymoma to identify new therapeutic targets | Childhood ependymoma is a malignant brain tumor, which accounts for a significant portion of pediatric central nervous system tumors. Ependymoma poses a major clinical challenge due to its resistance to conventional therapies and high recurrence rate. Despite characterizations of childhood ependymoma by genomics, epigenomics, and transcriptomics, effective therapies for ependymoma remain limited, underscoring the need for deeper molecular understanding. To address this, we will perform proteogenomic analysis on the ependymoma cases from the Children’s Brain Tumor Network (CBTN) with genomic, epigenomic, transcriptomic, and proteomic data available on Gabriella Miller Kids First Portal, Open Pediatric Brain Tumor Atlas, and Proteomics Data Commons. These harmonized datasets capture molecular alterations across every regulatory tier, from genetic mutations and transcriptional dysregulation to protein expression, signaling pathway perturbations, and metabolic reprogramming. Using R- and Python-based bioinformatic/biostatistics pipelines, we will (1) integrate multi-omics data to identify molecular subtypes of ependymoma (2) Identify subtype-specific molecular target proteins, dysregulated signaling pathways and tumor microenvironment features (3) Utilize AI models to analyze the characteristics of molecular subtypes and leverage drug databases to identify actionable molecular targets and develop personalized therapeutic strategies. This integrated proteogenomic approach offers a comprehensive strategy to unravel the biological complexity of this challenging pediatric brain tumor. To ensure robustness and generalizability, we intend to validate the findings using other available datasets, either publicly available or control-access data, so that candidate targets reflect reproducible disease biology rather than dataset-specific effects. We have assembled a team of investigators with expertise in proteogenomics and computational biology. Overall, our goal is to identify mechanisms driving tumorigenesis and progression in childhood ependymoma and to identify potential therapeutic targets that can inform future preclinical studies and clinical trials. |
Project Abstract_JHU_TMLih.pdf217.73 KB
|
||
19 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #19 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #19 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #19 | Sat, 06/21/2025 - 18:32 | Anonymous | 10.208.28.250 | Doug | B | Fridsma | MD, PhD | Chief Medical Officer | Health Universe | San Francisco, CA | Doug.Fridsma@healthUniverse.com | Development or refinement of analysis pipelines or AI/ML algorithms | Agentic AI, MCP, A2A, Clinical Trials, | Accelerating Pediatric Cancer Research Through Modular Agentic Workflows | We propose leading a team to develop a suite of innovative AI agents leveraging the MCP (Model Context Protocol) and A2A (Agent-to-Agent) architecture within Health Universe to transform pediatric cancer research workflows. Our solution will create composable, intelligent agents that seamlessly integrate NCI's vast data resources—including TARGET, CCDI, and clinical trial databases—to accelerate the bench-to-bedside pipeline. Core Innovation: We'll build specialized agents that can be assembled into dynamic workflows. For example, a "Genomic Insight Agent" could analyze TARGET sequencing data to identify novel fusion proteins in pediatric leukemias. This discovery automatically triggers a "Drug Repurposing Agent" that queries ChEMBL and NCI's compound libraries for potential inhibitors. A "Clinical Trial Design Agent" then evaluates patient stratification strategies using CCDI demographic data, while a "Protocol Optimization Agent" ensures age-appropriate dosing and monitoring requirements. Key Workflows We'll Enable: Discovery-to-Trial Pipeline: Automated identification of therapeutic targets → in silico drug screening → trial protocol generation with pediatric-specific considerations Real-time Trial Matching: Patient genomic profiles → eligible trial identification → enrollment feasibility assessment across multiple sites Biomarker Validation: Multi-cohort analysis across NCI datasets → statistical validation → clinical implementation pathways Technical Approach: Each agent will expose standardized MCP interfaces, enabling researchers to compose custom workflows through simple configuration. A "Pediatric Oncology Copilot" will orchestrate agent interactions, ensuring data privacy and regulatory compliance throughout. Team Needs: Seeking collaborators with expertise in pediatric oncology, bioinformatics, clinical trial design, and MCP/LLM development. Together, we'll create a transformative platform where individual contributions multiply through intelligent agent collaboration, ultimately accelerating life-saving treatments for children with cancer. |
|||
18 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #18 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #18 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #18 | Sat, 06/21/2025 - 18:18 | Anonymous | 10.208.24.91 | Austin | Brown | PhD, MPH | Associate Professor | Baylor College of Medicine | Houston, Texas | austin.brown@bcm.edu | Building specific disease cohorts, or visualization techniques | Epidemiology of Childhood Cancer Etiology, Outcomes and Survivorship | I am a tenured Associate Professor in the Department of Pediatrics at Baylor College of Medicine, where I lead a research program focused on the molecular epidemiology of childhood cancer, including both disease etiology and the acute and long-term adverse effects of cancer therapy. My expertise spans advanced statistical methods, molecular epidemiology, outcomes research, and cohort assembly, with a particular focus on integrating treatment, host, and genomic factors to understand complications following cancer therapy. I serve as the contact Principal Investigator of an NIH Gabriella Miller Kids First-funded X01 project entitled “Whole Genome Sequencing to Characterize Genetic Susceptibility and Variability in Pediatric and AYA Classic Hodgkin Lymphoma,” and have led multiple large, collaborative investigations examining the genetic basis of childhood cancer risk and survivorship outcomes. My federally, state, and foundation-funded research involves the recruitment of large, well-annotated patient and survivor populations, systematic data abstraction, biological sample collection, and rigorous analysis of molecular and clinical data. I hold leadership roles in national cooperative groups and consortia, including the Children’s Oncology Group (COG) Epidemiology Committee, COG Cancer Control and Supportive Care Committee, and the Childhood Cancer Survivor Study (CCSS) Psychology Working Group Steering Committee. I also direct the Epidemiology and Long-Term Survivor Programs at Texas Children’s Hospital and co-chair the Health Disparities Working Group of the Therapeutic Advances in Childhood Leukemia and Lymphoma (TACL) Consortium. For the Data Jamboree, I can contribute expertise in cohort design, data integration, and analysis of genomic and clinical outcomes data, as well as knowledge of existing pediatric cancer datasets and resources. I am particularly interested in exploring opportunities to harmonize and link datasets to investigate treatment-related outcomes and health disparities. | |||||
17 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #17 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #17 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #17 | Fri, 06/20/2025 - 12:59 | Anonymous | 10.208.24.91 | Serghei | Mangul | Ph.D. | Director | Challenges and Benchmarking | Sage Bionetworks | Seattle | serghei.mangul@sagebase.org | Employment of statistical methods or existing computational, mathematical, or informatics tools | RNA-Seq Immunophenotyping Childhood Cancer Immunology Ancestry-Inclusive Immune Profiling Bioinformatics Database Development Health Disparities Research | Developing reliable and scalable methods for deep immune phenotyping in public Childhood Cancer RNA-Seq Data repositories | Recent advancements in RNA-Seq technologies have significantly improved our ability to analyze individual transcriptomes. However, current analyses in immunology, especially concerning childhood cancer, are limited, often overlooking crucial information like ancestry, cell type composition, HLA type, KIR expression, and T/B Cell Receptor (TCR/BCR) repertoires. This missing data is vital for understanding immune responses and disease susceptibility across diverse populations. Existing RNA-Seq and immunological databases are insufficient, either lacking essential immunological and ancestry data or missing key immunological phenotypes. This critical gap prevents comprehensive immunological studies that consider the variability in immune responses across different populations, particularly important for childhood cancer. To overcome these limitations, we propose developing advanced bioinformatics tools and a comprehensive database to infer and integrate critical immune phenotypes and ancestry information directly from RNA-Seq data. Our approach will enable more accurate and thorough analysis of immune-related diseases across diverse populations by leveraging public RNA-Seq samples. Methods will be rigorously benchmarked to ensure reliability, providing crucial insights into health disparities. We will develop robust methods for deep immune phenotyping, including an accurate HLA typing tool using a pan-genome reference to minimize ancestral bias. We'll also create tools to infer Adaptive Immune Receptor Repertoires (AIRR) alleles and enhance T and B cell receptor assembly for improved precision in V(D)J recombination and clonotype assembly. A consensus-based method will be introduced for more accurate cell type composition analysis. The insights gained will be disseminated through a novel, user-friendly database—the largest collection of individuals with detailed immunological phenotypes across diverse backgrounds and disease conditions. This platform will offer comprehensive functionalities, including normalization and meta-analysis, accessible via an R package, GUI, and API. We will prioritize ethical and security issues, promoting access and reuse of pediatric cancer data and fostering interdisciplinary collaborations. |
Abstract (1).pdf58.43 KB
|
|||
16 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #16 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #16 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #16 | Fri, 06/20/2025 - 10:17 | Anonymous | 10.208.24.91 | Liang | Liu | Ph.D. | Wake Forest University Health Sciences | Winston-Salem | Liang.Liu@advocatehealth.org | Development or refinement of analysis pipelines or AI/ML algorithms | Deep Learning Frameworks for Functional and Immune Profiling of Splice Variants in Pediatric Brain Tumors | Alternative splicing (AS) is a key post-transcriptional mechanism that contributes to transcriptomic diversity and plays a critical role in pediatric brain tumor biology. This project aims to develop advanced machine learning models to systematically characterize AS events and their functional and immunological consequences in pediatric brain tumors. We propose a two-tiered computational framework. First, we will develop a graph-transformer deep learning model to predict the functional impact of tumor-specific splice variants. This model will integrate multimodal biological features—including splice site strength, isoform usage, protein domain disruption, and gene network context—into a graph structure where nodes represent AS events and edges encode known gene-gene and pathway interactions. The model’s self-attention mechanism will be adapted to prioritize biologically meaningful relationships, enabling accurate prediction of oncogenic potential and splicing dysregulation. Second, we will construct a graph neural network (GNN) to link AS events to the tumor immune microenvironment. This model will identify immune-associated splice variants and predict immunotherapy responsiveness by correlating AS profiles with immune cell infiltration and neoantigen load. An AS-Immune score will be derived to quantify the immunogenic potential of splicing alterations and validated across multiple pediatric brain tumor cohorts. The modeling framework will be trained using transcriptomic data from the Childhood Cancer Data Initiative (CCDI), aligning with national efforts to accelerate pediatric cancer research. Validation will be performed using independent datasets from the Children’s Brain Tumor Network (CBTN), Pediatric Brain Tumor Portal (PBTP), and Beat Childhood Cancer (BCC) consortium. These models are designed to be extensible to other pediatric and adolescent and young adult (AYA) cancers. Our multidisciplinary team includes Liang Liu, Ph.D., Wei Zhang, Ph.D., Anderson Cox, M.S., and Deha Ay, M.S. from Wake Forest University Health Sciences, and Giselle Sholler, M.D., Jeremy Hengst, Ph.D., and Abhinav Nagulapally, M.S. from Penn State Health Children’s Hospital. |
||||||
15 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #15 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #15 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #15 | Wed, 06/18/2025 - 15:29 | Anonymous | 10.208.28.250 | Minkyu | Park | Ph.D. | Senior Bioinformatician | Computational Genomics & Bioinformatics/Center for Biomedical Informatics & Information Technology/National Cancer Institute | Rockville | minkyu.park@nih.gov | Development or refinement of analysis pipelines or AI/ML algorithms | Risk stratification, Whole Slide Images, Deep Learning, Clinical data | An evolutionary deep learning platform for risk stratification in cancer patients. | Effective risk stratification of cancer patients is critical for precision oncology, particularly when leveraging deep learning models that integrate whole-slide images (WSIs) with omics and clinical data. Although sparse data collection has traditionally hindered the development of robust models, the increasing availability of curated datasets now offers significant opportunities for improvement. In response to this need, we propose an evolutionary, web-based deep learning platform for cancer risk stratification that encompasses an end-to-end training pipeline alongside a continuous inference pipeline for model evaluation. Building on our previous work, we have developed a training pipeline that leverages WSIs and omics data to construct risk stratification models. The platform will be initially deployed using existing childhood cancer data. As new data becomes available, the system automatically integrates this information, updates performance metrics via the inference pipeline, and retrains the model using the training pipeline. This iterative process promotes the gradual evolution and enhancement of model accuracy, with performance changes monitored at each update cycle. The platform will be applicable to all types of childhood cancer, and the new datasets implemented with this platform will enable the display of the status of deep learning model training using the current datasets. Once the model achieves robust performance, it can be employed for real-world predictions, thereby significantly enhancing the utility of the underlying datasets in clinical decision-making. | ||||
14 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #14 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #14 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #14 | Wed, 06/18/2025 - 15:06 | Anonymous | 10.208.28.250 | James | H | Tanis | Ph.D. | Senior Bioinformatician | Essential Software Inc | Gaithersburg, MD | james.tanis@nih.gov | Methods to enable data interoperability | Semi-Automatic Mapping of CDEs to the C3DC Data Model | Attributing the data model of the Cancer Clinical Data Commons (C3DC) with Common Data Elements (CDEs) provides significant value. It enables • Standardization and interoperability • Enhanced data quality and reproducibility • Improved efficiency and cost-effectiveness • Facilitates data sharing and collaboration • Supports advanced analytics and AI applications Regularly updating the C3DC data model is crucial for its usefulness because CDEs can become outdated. Due to its time-consuming nature, manually reviewing the multitude of data elements and CDEs required for this update is a challenge. To significantly reduce human effort, we propose to develop a LLM tool to automatically match CDEs to C3DC data elements. Humans will only verify the tool’s suggestions in the final step. |
CDE_C3DC_Abstract.docx16.75 KB
|
|||
13 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #13 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #13 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #13 | Wed, 06/18/2025 - 14:02 | Anonymous | 10.208.28.250 | Ying | Hu | Ph.D. | CGBB/CBIIT/NCI/NIH/HHS | Rockville, MD | yhu@mail.nih.gov | Development or refinement of analysis pipelines or AI/ML algorithms | tumor subtype, machine learning, gene interaction | Translating Multi-Modal Machine-Learning Insights into Clinically Actionable Subtype-Specific Biomarkers with an Emphasis on Biomarker Interactions | Background Accurate classification of tumor subtypes is pivotal for precision oncology, yet existing approaches rarely integrate multiple heterogeneous data types or explicitly model gene–gene interactions. Recent advances in machine learning (ML) now enable joint analysis of multi-omics profiles and features to uncover robust, biologically interpretable biomarkers. Objective This project will develop an integrated ML framework that: 1. Predicts tumor subtypes from combined RNA-seq (transcriptomic) and whole slide images (WSIs) data. 2. Identifies key genes and gene–gene interactions that drive distinctions between subtypes. 3. Elucidates functional roles of these genes and interactions through gene-set enrichment and network-diffusion analyses. Methods Data Harmonization RNA-seq counts will be normalized and batch-corrected, while WSIs will be pre-processed and summarized into quantitative radiomic features. Predictive Modeling Six complementary classifiers—glmnet, k-nearest neighbors, naïve Bayes, random forest, linear SVM, and XGBoost—will be trained with cross-validation, and their ensemble performance will be evaluated for subtype prediction. Feature Selection & Interaction Mining Feature selection coupled with stability selection will identify candidate subtype-associated genes. The vivid R package will detect synergistic gene pairs whose interactions significantly improve classification. Functional Interpretation Candidate genes and gene pairs will undergo gene-set enrichment analysis (GSEA) using Gene Ontology, KEGG, Hallmark, and Reactome sets. Protein–protein-interaction (PPI) network diffusion on Reactome and NeST networks will then reveal higher-order functional modules. Network Construction Significant genes and interactions will be integrated into a directed, interpretable gene network that highlights putative regulatory cascades distinguishing tumor subtypes. Expected Outcomes • A rigorously benchmarked multimodal ML pipeline for accurate tumor-subtype prediction. • A ranked list of subtype-defining genes and gene–gene interactions with robust statistical support. • An interactive gene-network visualization to guide experimental validation and therapeutic-target discovery. |
Abstract_3rdAnnSym20250519.docx17.67 KB
|
||||
12 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #12 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #12 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #12 | Wed, 06/18/2025 - 13:39 | Anonymous | 10.208.28.250 | trinh | nguyen | master | Bioinformatician | NIH/NCI/CGBB | Rockville, MD | tinh.nguyen@nih.gov | Development or refinement of analysis pipelines or AI/ML algorithms | multi omics, unsupervised clustering, pathway analysis, subgroups | Molecular Characterization of childhood cancer subtypes through Multi-Omics Clustering | To recognize clinically important intrinsic cancer subtypes, it is important to combine multi-omics datasets to find multi-omics clusters, and the interrelationships between biomolecules and their functions. Here, we will use unsupervised clustering using information from at least two data types to search for subgroups of interests. Next, we will use our Multi-omics Pathways Workflow, an automated Multi-omics Workflow on the Cancer Genomics Cloud to search for activated molecular pathways for each subgroup. The omics data could include copy number alterations, transcriptomics data, proteomics and phospho-proteomics data.The distinct pathways for subgroups found by unsupervised clustering will be displayed graphically (e.g., in heatmaps) to facilitate interpretation with clinical data. | ||||
11 | Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #11 | Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #11 | Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #11 | Wed, 06/18/2025 - 13:12 | Anonymous | 10.208.24.91 | Michael | Sierk | Ph.D. | Senior Bioinformatician | CGBB/CBIIT/NCI | Rockville, MD | michael.sierk@nih.gov | Building specific disease cohorts, or visualization techniques | variant calling, visualization, Cancer Genomics Cloud | VCF Table Viewer: Flexible Visualization of Variants Called from the Nextflow Sarek Pipeline | Variant calling pipelines produce variant caller format (VCF) files. VCF files have large amounts of information about called variants, especially if they are annotated by tools such as VEP, but are difficult to read and interpret directly, particularly for non-computational biologists. Thus, there are many tools available to extract information from VCF files for visualization and analysis. We present here a new Shiny app, VCF Table Viewer, that extracts information from annotated VCF files produced by the sarek variant calling pipeline from Nextflow Core and displays them in an interactive table. The table provides the ability to flexibly sort through a list of called variants while visualizing desired annotations, including color highlighting of various annotations. It also allows easy visualization of the bam file pileups in an embedded IGV tab for variants selected in the table, as well as plots of somatic allele frequencies from mutect2 calls over multiple samples. VCF Table Viewer provides a novel interface that facilitates the examination of variant calls by non-computational biologists. We provide examples from an ongoing clinical study of patients with familial platelet disorder. The code is freely available on Github. We propose integrating this app into the Cancer Genomics Cloud with easy selection of CCDI datasets for visualization, as well as making it easy to add on to the end of a variant calling pipeline. If there is enough interest we can add visualization features to the app. We do not have a pre-assembled team to work on this project. |
CCDI Jamboree Abstract.docx15.49 KB
|