NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

50 submissions

#	Starred	Locked	Notes	Created	User	IP address	First Name	Middle Initial	Last Name	Degree(s)	Position/Title/Career Status	Organization	Organization Address	Email	Abstract Category	Abstract Keywords	Abstract Title	Abstract Summary	Upload Abstract	Operations
10	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #10	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #10	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #10	Mon, 06/16/2025 - 16:03	Anonymous	10.208.28.250	John		Poulos	MS - Kinesiology		N/A	Rockville	jpoulos@confluent.io	Methods to enable data interoperability	Data Streaming	Enabling Seamless Data Interoperability for Cancer Research Across NIH with Confluent Platform	The National Cancer Institute’s Office of Data Sharing has identified data interoperability as essential to advancing cancer research and improving outcomes. Confluent Federal proposes the Confluent Platform, built on Apache Kafka®, as a real-time data streaming solution to unify fragmented cancer research data across the NIH ecosystem. Confluent enables event-driven interoperability, allowing data from systems such as The Cancer Genome Atlas (TCGA), SEER, the Cancer Imaging Archive (TCIA), and ClinicalTrials.gov to be integrated and shared in real time. Instead of relying on batch ETL pipelines or manual data reconciliation, institutes can publish and subscribe to continuous data streams—enabling faster, more collaborative research. For example, as genomic data is sequenced at an NCI center, it can be immediately streamed to researchers at NHGRI for analysis using AI-driven pipelines. Similarly, real-time monitoring of adverse event data from NCI’s clinical trials supports earlier detection of safety signals. Confluent’s architecture also supports federated learning, allowing NIH institutes to collaborate on machine learning models without centralizing sensitive data. The platform includes over 120 pre-built connectors to databases, APIs (including HL7 FHIR), cloud storage, and analytic tools such as Snowflake, SAS, and Databricks. It provides centralized schema management, access control, and end-to-end observability—ensuring compliance with FISMA, HIPAA, and NIH security standards. By providing a scalable, cloud-native backbone for streaming data, Confluent supports FAIR principles (Findable, Accessible, Interoperable, Reusable) and aligns with NIH’s strategic goals for data science. This approach breaks down silos, enhances collaboration across institutes, and enables a real-time data fabric that empowers the next generation of cancer research.	Accelerating Cancer Research Through Real-Time Data Interoperability Across NIH with the Confluent Platform (1).pdf212.8 KB	View
9	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #9	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #9	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #9	Mon, 06/16/2025 - 14:58	Anonymous	10.208.28.250	Rakesh		Khanna	B.A.		CBIIT Computational Genomics and Bioinformatics Branch	Rockville, MD	rakesh.khanna@nih.gov	Employment of statistical methods or existing computational, mathematical, or informatics tools	Deep Learning, Foundation Model, Whole Slide Images, Brain, Feature Extraction	Extracting Deep Learning Features from Childhood Brain Tumor Whole Slide Images for Multi-Modal Analysis	### Background: Whole slide images (WSIs) contain rich morphological information used to understand childhood brain tumors. However, accessing and processing this data requires significant computational overhead including specialized hardware and expertise. ### Methods This project aims to faciliate a number of downstream analyses utilizing histopathological features by generating pre-computed embeddings from WSIs of 433 childhood brain tumor patients from the Childhood Cancer Data Initative-Molecular Characterization Initiative (CCDI-MCI), a heterogenous collection of CNS tumors including gliomas, astrocytomas, and medulloblastomas. Using TRIDENT, a toolkit for large-scale whole-slide image processing we will generate both patch-level (UNI2-h, CONCHv1.5, and Prov-Gigapath) and slide-level representations (Threads, Titan and CHEIF). ### Purpose By providing pre-extracted features, we enable scientists to immediately incorporate histopathological information into their own analyses. We will demonstrate the potential utility of these feature sets through proof-of-concept downstream analyses according to available clinical and omics data received at the jamboree. All extracted features will be publicly available enabling the broader pediatric cancer community to leverage imaging data in their investigations without the typical computational overhead.		View
8	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #8	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #8	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #8	Fri, 06/13/2025 - 10:18	Anonymous	10.208.28.229	Lorena		Lazo de la Vega	Ph.D		Dana-Farber Cancer Institute	Boston, MA	lorena_lazodelavega@dfci.harvard.edu	Methods to enable data interoperability		Real World Data of Large Pediatric and AYA Cancers	At DFCI, we have clinical and genomic data we'd like to share with the larger research community and would like to explore ways to do that. While I do not have extensive computation experience I am familiar with R. My name area of expertise is in the genomic space.		View
7	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #7	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #7	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #7	Mon, 06/09/2025 - 10:56	Anonymous	10.208.24.225	Erik		Storrs	Ph.D.		Washington University	Saint Louis	estorrs@wustl.edu	Methods to enable data interoperability		Looking to join a team.	I'm looking to join a team. Flexible in terms of interest and happy to fit in wherever the greatest need is. Below are my areas of expertise. Data analysis/munging/formatting experience with a variety of different data types (mostly single-cell, spatial transcriptomics, and imaging-based datasets). Would say I'm most well versed in spatial/imaging datasets. Web application development, mostly with Javascript and SvelteKit. Computational pipeline development and deployment (including cloud-based). Experience with machine learning (mostly deep learning for imaging applications, but also some genomics applications).		View
6	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #6	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #6	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #6	Thu, 06/05/2025 - 09:58	Anonymous	10.208.24.39	Emily		Pearce	Ph.D.	Postdoc Fellow	NCI	Rockville	emily.pearce@nih.gov	Development of tutorial and educational tools, data storytelling, infographics, and other creative uses of data		Looking to join a project team	I am looking for a project team to join. I am a mixed methods researcher experienced in qualitative and quantitative methodologies. My research is focused on the psychosocial experiences of adults and young adults with rare cancer predisposition syndromes. Common areas of concern in the populations I have studied include young adult transition to adult healthcare, cancer screening management, and life course navigation with likely life-limiting disease. I have an interest in the use of AI for qualitative data analysis involving open-ended survey questions and interview transcripts. I am also interested in data visualizations and creative ways to disseminate data to lay audiences.		View
5	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #5	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #5	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #5	Sun, 06/01/2025 - 23:20	Anonymous	10.208.28.181	Zifan		Gu	M.S.	Graduate student	UT Southwestern Medical Center	Dallas, TX	zifan.gu@utsouthwestern.edu	Development or refinement of analysis pipelines or AI/ML algorithms	deep learning, predictive modeling, large language models, translational medicine	Looking for a project team to join!	My expertise is in developing machine learning pipelines for multimodal data, including both structured and imaging (H&E, multiplex) data. One of my key strengths is being able to communicate with both data scientists and clinicians, supported by my formal training in health data science and bioinformatics. I’m looking to join an interdisciplinary team that not only focuses on algorithm development but also considers how these models can be deployed into the real world. I’m particularly interested in projects that incorporate large language models as part of the predictive modeling pipeline, addressing outcomes such as remission, mortality, or disease progression. I’m well-traversed in HPC using PyTorch and Hugging Face, and I’d be excited to join a team that plans to use AWS or GCP as their computing platforms.		View
4	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #4	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #4	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #4	Thu, 05/22/2025 - 13:19	Anonymous	10.208.28.130	David		Higgins	Ph.D.	Informatics Program Manager	Children's Hospital of Philadelphia	Philadelphia, PA	higginsd@chop.edu	Development of tutorial and educational tools, data storytelling, infographics, and other creative uses of data	variant discovery, jupyter notebook, pyspark sql, genomics, training	Tutorial and Example Notebooks for the Kids First Variant Workbench	The goal of the Gabriella Miller Kids First Pediatric Research Program is to find common underlying genetic causes of pediatric cancer and structural birth defects. The Gabriella Miller Kids First Data Resource Center (Kids First DRC) produces high quality clinical and genomic datasets to support this goal, accessible and analyzable via our interoperable cloud platforms. The Kids First Variant Workbench powered by CAVATICA accelerates breakthroughs in pediatric medicine by combining Kids First participant conditions, genomic variants, and variant annotations. With each of these tools at their disposal, the Variant Workbench provides one workspace for researchers to make discoveries using Kids First datasets they have received access to, enabling them to accelerate variant discovery. On a technical level, the Variant Workbench is a series of tables containing information about Kids First participants, their genomic variants, and annotations such as ClinVar and CADD to put these in context. These tables can be joined together using PySpark SQL to isolate the specific fields of interest in the more than 400 million unique variants in the Kids First cohorts. The latest release contains germline and somatic variants from 9 cohorts, a total of ~3,900 participants. It is possible to query, analyze and display both types of variants at the same time for these studies. The goal of this project is to develop tutorial and educational tools to increase usage of the Variant Workbench. We have a specific interest in the development of written directions for integrating other datasets into the Variant Workbench such as TARGET as well as example Jupyter notebooks that can be executed by users to show the platform’s capabilities. Overall though, these tools could take many formats and we are looking forward to working with members of the research community to hear their feedback and ideas as well.		View
3	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #3	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #3	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #3	Mon, 05/12/2025 - 15:46	Anonymous	10.208.28.143	Stephanie		Spielman	Ph.D.	Data Scientist	Alex's Lemonade Stand Foundation	Bala Cynwyd, Pennsylvania	stephanie.spielman@ccdatalab.org	Methods to enable data interoperability		N/A	I am looking for a project team to join for this event. My background is in evolutionary computational biology, but I have worked in the pediatric cancer research space (either full-time or collaboratively) for 4-5 years. I primarily work broadly on pediatric cancer transcriptomtics in a purely computational environment. These days I am working primarily with R and UNIX but I have also worked in Python and with workflow tools like Nextflow. I am enthusiastic about open-source and reproducible coding practices (including GitHub), approaches that improve researchers' ability to obtain, manage, clean, and organize their research projects including data, and educating researchers about these frameworks. In addition, I have excellent written and oral communication skills with an extensive background in writing user-friendly documentation to support my software projects. My goals in joining this "data interoperability" group at the jamboree are to learn more about the limitations researchers experience when attempting to integrate different data sources and identify/begin implementing approaches (which might include technical software, documentation, or developing guidelines/recommendations) to reduce the barriers researchers experience when working with data from disparate sources.		View
2	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #2	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #2	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #2	Fri, 05/09/2025 - 15:21	Anonymous	10.208.24.48	Maggie		Cam	Ph.D.	Staff Scientist	NCI	Bethesda	maggie.cam@nih.gov	Methods to enable data interoperability	CRDC API, Pediatric Cancer, Immune Profiling, Reproducible Workflows, Immuno-Oncology Data Commons (IODC)	Reproducible Pediatric Immune Profiling Using CRDC APIs and Local HPC Analysis	This project will develop a reproducible workflow for immune profiling of pediatric solid tumors by combining CRDC API-based cohort selection with local analysis on NIH’s Biowulf cluster. The workflow will serve as a pilot for improving data reuse and supporting future methods development for the Immuno-Oncology Data Commons (IODC).	NCI-ODS Data Jamboree Abstract.docx17.15 KB	View
1	Star/flag NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #1	Lock NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #1	Add notes to NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #1	Fri, 05/09/2025 - 08:50	Anonymous	10.208.24.48	Jaclyn	N	Taroni	Ph.D.	Director of the Childhood Cancer Data Lab	Alex's Lemonade Stand Founda	Bala Cynwyd, PA	j.taroni@alexslemonade.org	Development or refinement of analysis pipelines or AI/ML algorithms		Seeking a project team to join	I am looking for a project team to join. I have a background in computational biology, machine learning, and data visualization. I additionally have experience in product management, short-format training, and tutorial/documentation writing as part of my current position, where I supervise scientists, UX designers, and software engineers. Given this range of experience, I would be happy to be placed on a team in many of the abstract areas. I can potentially contribute to the experimental design of AI/ML projects, programmatic visualization, or developing tutorials.		View