NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

50 submissions

#	Starred	Locked	Notes	Created	User	IP address	First Name	Middle Initial	Last Name	Degree(s)	Position/Title/Career Status	Organization	Organization Address	Email	Other (Please Specify)	Abstract Category	Abstract Keywords	Abstract Title	Abstract Summary	Upload Abstract	Operations
50	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #50	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #50	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #50	Fri, 06/27/2025 - 16:46	Anonymous	10.208.24.210	Jamie	C	Estill	Ph.D.	Systems Analysis Manager	University of Michigan Center for Translational Pathology	Ann Arbor, MI	jaestill@med.umich.edu		Project Member Role		Project Member	I am interested in participating as a project team member for the project submitted by Dr. Marcin Cieslik. I would like to be a virtual participant.		View
49	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #49	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #49	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #49	Fri, 06/27/2025 - 15:03	Anonymous	10.208.28.84	Diana		Thomas	MD, PhD	Associate Professor of Pathology, Director of Digital Pathology	Nationwide Children's Hospital	Columbus, Ohio	diana.thomas@nationwidechildrens.org		Building specific disease cohorts, or visualization techniques	pediatric brain tumors, data standardization, interactive data visualization, image analysis and machine learning	Enabling Discovery in Rare Pediatric Brain Tumors Through Data Integration and Visualization	The increasing availability of robust publicly accessible childhood cancer datasets offers a significant opportunity to advance research and improve outcomes for children with rare, understudied tumors such as pediatric brain cancers. Our project ideas aim to enhance existing data and web-based platforms, including those from the CCDI Molecular Characterization Initiative, by developing tools that make complex data more accessible to the pediatric brain cancer research community. We envision a user-friendly platform featuring interactive visualization tools such as oncoprints, survival plots, and mutation heatmaps that allow researchers without coding or bioinformatics expertise to build and analyze cohorts using clinical, genomic, pathology, treatment, and follow-up data. By enabling intuitive exploration, the tool will support clinicians, pathologists, and researchers in identifying patterns and generating hypotheses, particularly for rare or newly characterized tumor types. A key component of CNS tumor classification is DNA methylation profiling. Current datasets have been processed with varying classifier versions, resulting in inconsistent tumor class labeling. To address this, we propose re-processing raw data files (.idat) using one or more current classifiers (e.g., NCI Bethesda, IGM v1.0). This approach will harmonize classification across datasets, improve diagnostic precision and support reproducibility and cross-study comparisons, with outputs made available via dbGaP. Additionally, we aim to leverage whole slide pathology images to develop algorithms for histopathologic feature extraction and machine learning. These algorithms could enhance diagnostic capabilities, especially important given the shortage of pediatric neuropathologists nationwide. By integrating clinical, genomic, and image-based data, this project will accelerate discovery in rare tumor types and inform clinical trial design and treatment strategies in pediatric neuro-oncology. We welcome collaboration on any or all aspects of this proposed work.	Data jamboree abstract.docx15.4 KB	View
48	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #48	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #48	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #48	Fri, 06/27/2025 - 12:47	Anonymous	10.208.28.84	Sapna		Oberoi	MD, MSc	Assistant Professor	CancerCare Manitoba	Winnipeg, Manitoba	soberoi@cancercare.mb.ca		Building specific disease cohorts, or visualization techniques	soft tissue sarcoma, tumor microenviornment, immune classification, pediatrics	Immune Characterization of Pediatric Soft Tissue Sarcomas	Rationale: Pediatric Soft-tissue sarcomas (STS) represent a rare and histologically diverse group of cancers with variable clinical behaviour and limited responsiveness to immune checkpoint blockade. While recent transcriptomic analyses in adult STS have led to a reproducible immune-based classification, comprising five tumor microenvironment (TME) phenotypes ranging from immune-low to immune-high and vascularized subtypes, relevance of these findings to pediatric STS remains unknown. In adult STS, immune-high, B cell–enriched phenotype (class E), characterized by the presence of tertiary lymphoid structures (TLSs), was associated with favorable survival and responsiveness to PD-1 blockade. These observations underscore the potential of immune phenotyping to guide risk stratification and immunotherapeutic decision-making in pediatric STS. Methods: We propose to characterize the immune landscape of pediatric STS using bulk RNA sequencing data from patients enrolled in the Molecular Characterization Initiative of the National Cancer Institute and the Children’s Oncology Group. Our analytic framework will follow the approach described by Petitprez et al. (PMID: 31942077), applying a transcriptomic-based classification scheme to assess the applicability of adult-derived immune subtypes in pediatric disease. TME composition will be inferred using the MCPcounter algorithm, which estimates the relative abundance of key immune and stromal cell populations. Results will be compared with those obtained from other computational methods, including xCell, CIBERSORT and deconvolution-based algorithms such as QuanTIseq. Immune phenotypes will be correlated with clinicopathologic features, genomics data and clinical outcomes data available through the Childhood Cancer Data Initiative (CCDI). In parallel, pathologists will evaluate the H & E slides to assess relevant morphological features that may be associated with TME clustering and the presence of TLSs in these tumors. Expected Outcome: This work aims to define the immune architecture of pediatric STS, assess the translatability of adult-derived immune subtypes, and inform the development of immune-informed therapeutic strategies for children and adolescents with sarcoma.	Immuneclassificaiton_STS_Abstract_June27_2025_0.pdf88.48 KB	View
47	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #47	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #47	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #47	Fri, 06/27/2025 - 07:50	Anonymous	10.208.24.210	Carla	J	Berg	PhD, MBA	Professor	George Washignton University	Washington, DC	carlaberg@gwu.edu		Building specific disease cohorts, or visualization techniques		Joining project team	I am a clinical health psychologist with an MBA (emphasis in marketing) who has spent >15 years in schools of public health and cancer centers. I'm happy to contribute to any group in any capacity.		View
46	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #46	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #46	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #46	Thu, 06/26/2025 - 17:23	Anonymous	10.208.24.68	Jinghui		Zhang	PhD	Professor	St Jude Children's Research Hospital	Memphis	jinghui.zhang@stjude.org		Solving a challenging clinical problem by data integration	Ontology Classifier Data Integration	Exploring new data integration approaches to enhance ontology-based pediatric cancer classification on challenging clinical cases	Ontologies designed for cancer classification have redefined our understanding of cancer by providing a hierarchical structure of complex biomedical data. Integration of omics findings with other established approaches such as histology and immunohistochemistry is now the new standard for clinical practice as reflected in the 5th edition of WHO CNS tumor classification scheme. We have developed a pediatric cancer focused ontology framework by leveraging existing efforts from OncoTree, WHO, and community knowledge. This framework has been applied to >5,000 pediatric samples with omics data accessible on the St Jude Cloud platform (https://www.stjude.cloud/) and more recently on >1,000 pediatric solid and CNS tumors profiled by the Childhood Cancer Data Initiative (CCDI) based on the clinical annotation as well as genomic alterations identified from exome and RNA-seq/Archer fusion platform. For CCDI sample classification, we have encountered multiple challenging cases with ambiguous or conflicting results indicating additional analytical approaches or data may improve classification. For example, for samples annotated as “Small round blue cell tumor” that have bi-allelic loss of SMARCB1 identified by exome analysis, can they be classified as rhabdoid tumors if additional information on tissue source or clinical imaging data can be obtained? Can newly developed RNA-seq expression-based machine-learning approaches, such as CanID (https://github.com/chenlab-sj/CanID), be used to augment classification over biomarker-based approaches? Can genome-wide copy number profile be used to improve the classification? Our project is aimed at exploring the value of new data and new analyses in advancing pediatric cancer classification. We will select ~20 challenging CCDI cases for real-time analysis at the Jamboree with a team of algorithm developers, ontology designers and clinical analysts to gain insights on future directions to improve the precision of ontology-based classification.		View
45	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #45	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #45	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #45	Thu, 06/26/2025 - 14:02	Anonymous	10.208.24.68	Michael		Watkins	Ph.D.	Manager of Data Standards and Modeling	Data for the Common Good, University of Chicago	Chicago	michaelwatkins@bsd.uchicago.edu		Methods to enable data interoperability	terminologies, semantics, knowledge graph, rdf, sparql	Developing an Oncology Knowledge Grap	As data interoperability has risen to the forefront of clinical trial design and RWD capture, community awareness of clinical data standards has never been higher. Clinicians and data scientists alike understand that bespoke data modeling leads to complex and manual downstream harmonization. However, the resultant proliferation of clinical data standards does not fully realize data interoperability. There is a “last mile” need for computational approaches to data mapping and semantic reasoning that can leverage these standards to semi-automate the task of data harmonization. Perhaps the most difficult aspect of interoperating over data bound to different terminological standards is that concepts are rarely exact matches and are usually partially equivalent in an ill-defined way. Knowledge graphs are a mainstay for semantic reasoning in many other industries and consist of concepts (nodes) and relations (edges). By encoding these concepts and relations in a graph representation, such as the Resource Description Framework (RDF), a reasoning language like SPARQL can query this knowledge graph and provide a user with a precise and computational relationship between two concepts. Aims: 1. Curate a set of RDF triples that encode the relationships between oncology-related concepts from NCIt, SNOMED-CT, ICD-O, Disease Ontology, and Uberon (scoped by specific use cases). 2. Combine those sets into a small but linked proof-of-concept knowledge graph. 3. Develop SPARQL queries that can access the knowledge graph for given clinical terms. 4. Instantiate those queries within a data mapping demo that takes in C3DC data and annotates it with additional concept bindings from the knowledge graph.	Proposal.pdf42.12 KB	View
44	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #44	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #44	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #44	Wed, 06/25/2025 - 12:49	Anonymous	10.208.24.68	Xin		Zhou	PhD	assistant member	St. Jude Children's Research Hospital	Memphis	xin.zhou@stjude.org		Employment of statistical methods or existing computational, mathematical, or informatics tools	survivorship, epidemiology, genetics	Discover Novel Treatment–associated Late Adverse Effect in Long term survivors of childhood cancer by leveraging the Childhood Cancer Survivorship Portal	Owing to continued therapeutic improvement and innovation, the number of survivors of childhood cancer in the United States will exceed 580,000 by 2040. However, long-term survivors face a broad spectrum of adverse health outcomes such as subsequent neoplasms and cardiomyopathies. Understanding the association of these long-term adverse outcomes and treatment exposure is critical to advancing survivorship care and optimizing pediatric cancer treatment. To enable the exploration of the survivorship data, we developed Survivorship Portal, a public resource with demographic and clinical data, cancer diagnoses, chemotherapy and radiation exposures, risk-based clinical assessments, laboratory results, patient-reported outcomes, and whole-genome sequencing–derived germline genotypes on 28,500 childhood cancer survivors enrolled in St Jude Life and Childhood Cancer Survivorship Study (CCSS) which include > 1,600 phenotypic variables and 400 million genetic variants (Matt et al, Cancer Discovery 2024) We propose to utilize the Childhood Cancer Survivorship Portal to investigate novel associations between cancer treatment exposures and late effects across organ systems. The project will leverage the portal’s rich dataset and interactive analytical tools, to construct exposure–outcome models to assess correlations between treatment regimens and organ-specific late effects. Through the portal's advanced cohort-building tool, the team will identify clinically meaningful survivor subgroups, such as those at elevated risk due to treatment intensity, age at exposure, or genetic susceptibility, thus making more refined and novel analysis. Our team includes postdoctoral trainees in survivorship research and is led by a senior scientist overseeing portal development. Through this effort, we aim to demonstrate the scientific value and utility of the Survivorship Portal while generating actionable insights for risk stratification and follow-up care in survivors of pediatric cancers. This project will not only advance survivorship research but also promote awareness and adoption of the portal within the broader childhood cancer research community.		View
43	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #43	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #43	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #43	Tue, 06/24/2025 - 18:46	Anonymous	10.208.24.68	Jennifer	Marie	Torres Del Valle	PhD, MPH,MA	Post Doc in Clinical & Translational Research/ Public Health Program Administrator	Pennsylvania Department of Health/University of Puerto Rico-Medical Sciences Campus	Harrisburgh, PA	jennifer.torres@upr.edu		Building specific disease cohorts, or visualization techniques	HPV, Adolescent Immunizations, Cancer Prevention, Public Health Registry, Vaccine Disparities	Leveraging Statewide Immunization Data to Identify Gaps and Opportunities in Adolescent HPV Vaccination Coverage in Pennsylvania	Human papillomavirus (HPV) vaccination is a proven strategy to prevent several HPV-related cancers, including cervical and oropharyngeal cancers. However, adolescent HPV vaccination rates in Pennsylvania remain below national targets, with only 61.4% of adolescents completing the series as of 2023. Disparities in vaccine uptake are most pronounced in rural and underserved communities, where healthcare access is limited and vaccine hesitancy is rising. To address these challenges, the Pennsylvania Department of Health is launching a statewide initiative to establish a comprehensive adolescent immunization baseline using the Pennsylvania Immunization Electronic Registry System (PIERS). This system provides de-identified, dose-specific records that will be analyzed by age, geography, race/ethnicity, and insurance type to identify vaccination coverage gaps. Although PIERS does not collect socioeconomic data, geographic markers will allow linkage with external datasets to approximate social risk factors. In Year 1, we will focus on building the dataset, validating record completeness, and mapping disparities. A provider and community survey will also assess vaccine confidence and barriers to HPV and other adolescent vaccine uptake. An additional goal is to define data-driven benchmarks for immunization coverage across different regions and populations. These benchmarks will serve as reference points to measure progress and guide future programmatic decisions. Findings will support the development of targeted interventions in Years 2–3, with an emphasis on improving HPV uptake and reducing disparities. This initiative aligns with national goals in the CDC’s Vaccination Strategic Plan and Healthy People 2030 and supports childhood cancer prevention by improving access to the HPV vaccine. By presenting at the NCI Inaugural Data Jamboree, we aim to demonstrate how statewide immunization data can inform benchmarks, close equity gaps, and advance early cancer prevention strategies.		View
42	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #42	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #42	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #42	Tue, 06/24/2025 - 13:17	Anonymous	10.208.28.5	Chaya		Moskowitz	PhD	Attending Biostatistician	Memorial Sloan Kettering Cancer Center	New York	moskowc1@mskcc.org		Building specific disease cohorts, or visualization techniques		project member	I would prefer to be a project member, but could also be a narrator if that was what was needed. I have expertise in study design, design of cohorts, analyzing data using statistical software, and knowledge of appropriate statistical methods to use with complicated data. Unless travel funds are provided, I will be a virtual attendee.		View
41	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #41	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #41	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #41	Tue, 06/24/2025 - 11:46	Anonymous	10.208.28.5	Minghong		Ward	MS. Electrical and Computer Engineering	Product Owner for dbGaP FHIR Product	NLM/NCBI	BETHESDA	wardming@nih.gov		Methods to enable data interoperability	dbGaP, FHIR, DRS, Cloud-computing, Interoperability	Making dbGaP data interoperable and analysis-ready with FHIR and DRS API	The NIH’s database of Genotypes and Phenotypes (dbGaP) includes data from 3,000+ studies across 800 diseases/focuses, involving 4 million participants and 450,000+ phenotype variables. dbGaP has supported over 8,000 publications. As researchers increasingly rely on cloud platforms to conduct cross-study analysis, both interoperability and ease of access are urgently needed. We developed FHIR (Fast Healthcare Interoperability Resources) API for dbGaP to deliver both open-access and controlled-access dbGaP data via FHIR. The open-access API provides programmatic access to study metadata, enabling researchers to discover relevant datasets for data discovery. Controlled-access API deliver over 1.1 billion phenotypic observations and molecular sequence files through persistent URLs using the GA4GH Data Repository Service (DRS), another global standard. We will show how a simple Python script in a Jupyter notebook can perform phenotype-driven statistical analysis across multiple datasets and repositories using the FHIR API. This approach enhances data reuse, facilitates cohort building, and helps accelerate reproducible research at scale.	KidsFirst_ODS_poster_June2025.docx14.38 KB	View
40	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #40	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #40	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #40	Mon, 06/23/2025 - 23:49	Anonymous	10.208.28.57	Ali	I	Hashmi	BS	Senior Data Scientist	IBM Consulting	Herdon, VA	ahashmi@us.ibm.com		Methods to enable data interoperability	Interoperability, FHIR, Federated, Analytics	Building a FHIR-Based Data Integration Platform for Pediatric and AYA Cancer Research	Pediatric and adolescent/young adult (AYA) cancer research is hampered by fragmented data silos, inconsistent data standards, and labor-intensive harmonization processes that limit the pace of discovery and the delivery of precision care. To address these challenges, we propose an innovative data integration platform built on the Fast Healthcare Interoperability Resources (FHIR) standard. Leveraging oncology-specific FHIR profiles such as mCODE and extending them with pediatric- and AYA-specific data elements, the platform will enable automated, real-time extraction and harmonization of clinical, molecular, imaging, and patient-reported data from electronic health records and research databases across institutions In this project we will explore the feasibility of deploying FHIR servers at participating sites, supporting secure, standards-based data exchange and federated analytics, thus preserving patient privacy while enabling collaborative research on rare and heterogeneous pediatric cancers. Automated APIs and tools will drastically reduce the manual effort required to prepare data for research, while supporting longitudinal tracking of patients from diagnosis through survivorship. By integrating diverse data modalities and facilitating seamless data sharing, the platform will accelerate biomarker discovery, risk stratification, and the development of personalized therapies. Ultimately, this project will establish a scalable, interoperable data ecosystem, transforming pediatric and AYA cancer research and care, and serving as a model for other rare disease domains. We will get hands on with sample and representative datasets, compare the project goals against existing implementation guides and/or demonstrations, and craft a development plan toward achieving a minimum viable product (MVP). To the extent possible, and leveraging open-source assets, we will seek to demonstrate these features in code.	Data Jamboree Abstract 2025-06-23.docx30.81 KB	View
39	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #39	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #39	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #39	Mon, 06/23/2025 - 23:31	Anonymous	10.208.28.57	Weiping		Ma	Ph.D.	Data Scientist	Icahn school of medicine at Mount Sinai	New York	weiping.ma@mssm.edu		Development or refinement of analysis pipelines or AI/ML algorithms	missing value, imputation, DIA, TMT	Missing data Imputation on proteomics data from DIA experiment	Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. It has gained significant attention in proteomics research recently. Comparing with another popular MS based technique: labeled proteomics experiments such as TMT (Tandem Mass Tag), DIA data has its advantage on effectively measuring low-abundance proteins and quantitative accuracy. On the other hand, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values. In this project, we propose to perform benchmark evaluations for imputing NA values in DIA data sets. We will use the DIA and TMT global proteomics data of 441 pediatric brain tumor samples across 9 histology from the Kids First pediatric brain tumor study. Specifically, we will evaluate the imputation performance on the DIA data, using the matched TMT datasets as golden standards. We will include multiple commonly used imputation methods in the evaluation, such as KNN, machine learning based imputation methods, low-rank matrix completion techniques, deep learning models. We will also bench-mark our previous work, DreamAI, an ensemble-based imputation methods, which has been successfully applied in numerous CPTAC studies. Additionally, we will also evaluate the impact of imputation accuracy on downstream statistical analysis, such as association and pathway enrichment analysis. Furthermore, we will investigate novel approaches to jointly impute matched DIA and DDA (TMT) data of the same sample to enhance protein coverage.		View
38	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #38	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #38	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #38	Mon, 06/23/2025 - 20:26	Anonymous	10.208.24.253	Ariana	M	Familiar	PhD	Supervisory Data Scientist, Center for Data-Driven Discovery in Biomedicine	Children's Hospital of Philadelphia	PHILADELPHIA	familiara@chop.edu		Development or refinement of analysis pipelines or AI/ML algorithms		Automated Quality Control and Stain Classification of Whole-Slide Images in Pediatric Brain Tumors: Developing Scalable Harmonization Methods for the Children’s Brain Tumor Network Dataset	High-resolution whole-slide images (WSIs) are increasingly central to pediatric brain tumor research, yet large-scale quality control (QC) and metadata curation remain persistent bottlenecks. As part of the Kids First program, the Children's Brain Tumor Network (CBTN) repository provides a large dataset of WSIs across pediatric brain tumor histopathologies (2,277 patients, 2,620 tumor samples, 19,176 WSIs). Given this collection consists of WSIs acquired through clinical protocols, available stain types across samples can differ due to their diagnostic cohort and thus the clinical relevance of specific stain markers. WSIs can also exhibit considerable variability in tissue quality and digitization artifacts, often without reliable annotations. These inconsistencies limit downstream applications in computational pathology and multi-modal integration. We propose testing unsupervised and supervised machine learning methods to address two critical challenges: (1) automated detection of poor-quality or outlier WSIs, and (2) classification of stain type (e.g., H&E, Ki-67, GFAP). Leveraging dimensionality reduction and patch-level feature extraction via pretrained convolutional neural networks (e.g., ResNet, CLIP, or foundational digital pathology models), we will cluster WSIs or tile patches into quality- and stain-coherent groups. Clustering results will be validated against available metadata, expert annotation, and slide-level inspection. Our approach provides a scalable, annotation-light method to improve data hygiene in CBTN’s extensive pathology archive. By identifying low-quality or mislabeled images and surfacing underrepresented staining types, this project supports more reliable use of CBTN pathology data in downstream machine learning pipelines and biomarker discovery	Pathology harmonization abstract.docx15.13 KB	View
37	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #37	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #37	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #37	Mon, 06/23/2025 - 20:13	Anonymous	10.208.24.253	Brian		Capaldo	Ph.D	Biomedical informatics specialist	National Cancer Institute	Rockville	brian.capaldo@nih.gov		Employment of statistical methods or existing computational, mathematical, or informatics tools	linear modeling, xenograft	Mixed effect modeling of human and mouse data in xenograft studies corrects for missed contamination or transcriptomic spillover	Xenograft models present a unique opportunity to study patient samples in an in vivo model, however, contamination of mouse reads can result in spurious interpretations of the data. Using linear mixed effect models, we can effectively remove influences of the mouse transcriptome on the human data.	Xenograft models are widely used as a surrogate for better molecular characterization of patient disease.docx13.62 KB	View
36	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #36	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #36	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #36	Mon, 06/23/2025 - 18:55	Anonymous	10.208.24.253	Mary		Goldman	B.S.		UC Santa Cruz Genomics Institute	Santa Cruz	mary@soe.ucsc.edu		Building specific disease cohorts, or visualization techniques	Visualization Xena User Education	Interested in joining a team to provide visualization or educational materials of Jamboree results	I work as the Design and Outreach Engineer for the UCSC Xena project (xena.ucsc.edu), a visualization tool for cancer genomics data. As part of my job, I work with collaborators and researchers to better understand how they use pediatric single cell and spatial data. This helps UCSC Xena to develop more useful visualizations and analyses of this data. I think it is important to put single-cell datasets in the hands of all researchers, especially those without computational expertise. To do this we need easy-to-use online tools that allow researchers to explore both single-cell RNA-seq and spatial transcriptomic datasets. These online tools need to be intuitive to use as well as deliver visualizations and analysis results that enable a deep understanding of tumor biology, driving innovation and discovery in the cancer genomics field. I want to participate in the Jamboree to brainstorm new visualizations of pediatric single cell data and the data resulting from the Jamboree. I also want to better understand how users are working with single-cell and spatial transcriptomic data so that I can design new User Interfaces and User Workflows to help researchers achieve their goals. I hope to use my design skills to create wireframes in Balsamiq (https://balsamiq.com/) that would enable researchers to explore Jamboree results. These wireframes could then be implemented by developers on the Jamboree team, or if there is not enough time, the wireframes themselves could be an output of the collaboration. I also have extensive experience developing educational and tutorial materials and would be interested in applying them to create a Jamboree output. I hope to develop collaborations that help UCSC Xena to become a more useful tool and continue after the Jamboree has concluded. Please note that I do not have travel funds to attend. I will require travel funds to attend.		View
35	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #35	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #35	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #35	Mon, 06/23/2025 - 18:50	Anonymous	10.208.28.103	Nicole		Tignor	Ph.D.	Assistant Professor	Icahn School of Medicine at Mount Sinai	New York	nicole.tignor@gmail.com		Employment of statistical methods or existing computational, mathematical, or informatics tools	Pediatric high-grade glioma, Glycoproteomics, Prognostic biomarkers, Tumor heterogeneity	Cross-Population Survival Analysis (CPSA) to Investigate Glycoproteomic Prognostic Markers in Pediatric High-Grade Glioma	Glycosylation is a critical post-translational modification that influences tumor cell adhesion, migration, and immune evasion, yet its role in pediatric high-grade glioma (HGG) remains poorly understood. Progress in this area is limited by small sample sizes in HGG glycoproteomics and the methodological challenge of disentangling developmental variation from tumor-intrinsic heterogeneity. We recently developed a computational framework—Cross-Population Survival Analysis (CPSA)—that models molecular abundance as a continuous function of age. This enables interpolation of abundance values across cohorts, providing a bridge between datasets and supporting survival association analysis even when direct abundance measurements are missing or limited. At this data jamboree, we propose to apply CPSA to characterize the prognostic relevance of glycoproteomic features in pediatric HGG. We will analyze 29 HGG tumors from the Kids First Pediatric Brain Tumor Study and 79 HGG tumors from a separate CBTN-CPTAC pediatric study, jointly profiling 465 glycoproteins and 2,508 glycopeptides. Using a harmonized preprocessing pipeline and sex-stratified Cox regression models, adjusted for age, mutation status, and global protein levels, we will test whether previously identified survival-associated glycopeptides replicate in this independent cohort. In parallel, we will assess whether 539 newly detected glycoproteins and 11,622 glycopeptides expand the repertoire of prognostic markers in pediatric HGG. This effort will evaluate the reproducibility and generalizability of glycosylation-based survival signals and clarify the role of post-translational regulation in pediatric glioma risk stratification. More broadly, it illustrates how harmonized proteomic profiling and age- and sex-aware modeling via CPSA can advance biomarker discovery in rare pediatric cancers.		View
34	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #34	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #34	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #34	Mon, 06/23/2025 - 18:44	Anonymous	10.208.28.103	Sara	R.	Savage	Ph.D.	Staff Scientist	Zhang Lab, Baylor College of Medicine	Houston	ssavage@bcm.edu		Employment of statistical methods or existing computational, mathematical, or informatics tools	visualization, data accessibility, informatics tools,	Flexible and accessible omics analyses using LinkedOmics	One of the biggest challenges in the utilization of large cancer datasets is the lack of accessibility to most interested parties. Many scientists may be unaware the data exist and they may be unsure of how to access and analyze the data. We have created a tool, LinkedOmics (https://www.linkedomics.org/login.php), which is a web-based application that makes multi-omics data analyses available to the research community. Within a dataset, a user can select one feature (such as clinical stage) to be correlated with all features in a data type (such as expression of all genes). Results are displayed as graphs and tables and can further be submitted for enrichment analyses. We plan to format and deposit the publicly available childhood cancer datasets into LinkedOmics. As a demonstration of use, we will correlate common mutation events in each cohort with molecular data and perform enrichment analyses to identify altered pathways in tumors with the mutation. We will also perform meta-analyses to compare common alterations across cancer types. The data will be uploaded to LinkedOmics before the jamboree and all analyses and demonstrations can be performed on a personal laptop.		View
33	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #33	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #33	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #33	Mon, 06/23/2025 - 18:10	Anonymous	10.208.24.253	Azra		Krek	Ph.D.	Senior scientist	Mount Sinai School of Medicine, Department of Genetics and Genomics, Francesca Petralia lab	New York	azra.krek@mssm.edu		Employment of statistical methods or existing computational, mathematical, or informatics tools	pediatric brain tumors; tumor microenvironment; multi-omic data integration; cell-type deconvolution;	Characterizing the Tumor Microenvironment in Pediatric Brain Tumors	Pediatric brain tumors have been reported to have large cell composition heterogeneity in the tumor microenvironment (Petralia et al, Cell 2020, PMID: 33242424). To better characterize the immune landscape as well as tumor cell-unique biology among these tumors, we propose to perform comprehensive deconvolution analysis using proteogenomic data of ~400 pediatric tumors from an on-going pan-histology pediatric brain tumor study through the collaboration between the Kids First consortium and NCI-CPTAC. The histology covered in the study include Craniopharyngioma (n=64), Ependymoma (n=112), Medulloblastoma (n=128), and others. Specifically, we will employ BayesDeBulk , a Bayesian framework that can jointly analyze gene expression and proteomics data, to systematically characterize the cellular composition of tumors, including immune, stromal, and vascular components, and infer tumor cell specific expression profiles from the bulk data. In parallel, we will analyze post-translational modification data, in particular the phosphorylation patterns, to estimate kinase activity and uncover active signaling pathways that may drive TME characteristics (i.e. cell composition vector). By linking these signaling profiles, dynamic cell compositions, tumor cell specific expression, and the clinical properties of each patient, we identify distinct microenvironmental and functional patterns across tumor types. This integrated approach enables a deeper understanding of tumor biology from bulk data and could reveal potential therapeutic targets based on both microenvironmental context and dynamic signaling networks. Our findings demonstrate the value of combining multi-omic data with advanced computational modeling to inform more precise, biology-driven treatment strategies for pediatric brain tumors.	abstract_data.jamboree_AzraKrek.docx15.52 KB	View
32	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #32	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #32	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #32	Mon, 06/23/2025 - 18:01	Anonymous	10.208.24.253	Alexander		Pilozzi	M.Sc. Bioinformatics	Bioinformatics Analyst	ICF	Reston	Alexander.Pilozzi@icf.com		Methods to enable data interoperability		CC-CARE-AI: A Framework for Assessing and Refining AI-Readiness of Childhood Cancer Cohorts from Kids First, TARGET, and CCDI	I am registering as part of Yin Lu's (yin.lu@icf.com) team, abstract title "CC-CARE-AI: A Framework for Assessing and Refining AI-Readiness of Childhood Cancer Cohorts from Kids First, TARGET, and CCDI"		View
31	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #31	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #31	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #31	Mon, 06/23/2025 - 17:34	Anonymous	10.208.28.103	Anna	T.	Fernandez	Ph.D.	Principal/Director	Booz Allen Hamilton	Bethesda, MD	fernandez_anna@bah.com	gill_abegail@bah.com	Building specific disease cohorts, development/refinement of AI/ML algorithms, and methods to enable interoperability		Practical Data Quality Assessment and Enrichment of Childhood Cancer Datasets	Childhood cancer data is steadily increasing in prevalence and diversity, enabling development of artificial intelligence (AI) solutions with improved diagnostic accuracy and greater impact on pediatric patients. Our team will document the process, challenges, and opportunities in connecting disparate childhood cancer datasets (TBD) for a high-level use case to navigate through the following objectives: a) assess the AI readiness of at minimum two datasets individually that may be combined to more effectively solve a research problem- the data quality/data readiness will be reviewed in terms of consistency, completeness, and data collinearity within each individual data set; b) investigate and present how data can be supplemented with external data sets and/or integrated with other sources (e.g., explore if accessing summary statistics from NCCR for similar patient populations or enriching the data for specific individuals is possible through demographic location, etc.); c) explore the ability of these data sets (from different original cohorts) to complement or strengthen certain data quality areas; and d) define possible AI/ML pipelines and run through the data preparation phase for at least one pipeline to highlight benefits of the data aggregation and enrichment strategies outlined above. Our team will present our findings and next steps as part of the challenge, sharing the example scripts that could be reused by the community. We plan to assess and analyze datasets through the use of Python and R. The computing environment needed to carry out our project is TBD but may entail use of an NCI-funded cloud resource, local compute, or commercial cloud environments (e.g., AWS, Azure). Our project team from Booz Allen Hamilton includes Anna Fernandez, Abdullah Awaysheh, James Galbraith, Abegail Gill, Lucy Han, and Brandon Konkel.		View