NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #31

Submission information
Submission Number: 31
Submission ID: 145185
Submission UUID: bd7513b1-b636-465e-ad1e-b03b984ddb79

Created: Mon, 06/23/2025 - 17:34
Completed: Mon, 06/23/2025 - 17:34
Changed: Mon, 06/23/2025 - 17:34

Remote IP address: 10.208.28.103
Submitted by: Anonymous
Language: English

Is draft: No
Presenter Information
Anna
T.
Fernandez
Ph.D.
Principal/Director
Booz Allen Hamilton
Bethesda, MD
gill_abegail@bah.com
Abstract Information
Building specific disease cohorts, development/refinement of AI/ML algorithms, and methods to enable interoperability
{Empty}
Practical Data Quality Assessment and Enrichment of Childhood Cancer Datasets
Childhood cancer data is steadily increasing in prevalence and diversity, enabling development of artificial intelligence (AI) solutions with improved diagnostic accuracy and greater impact on pediatric patients. Our team will document the process, challenges, and opportunities in connecting disparate childhood cancer datasets (TBD) for a high-level use case to navigate through the following objectives: a) assess the AI readiness of at minimum two datasets individually that may be combined to more effectively solve a research problem- the data quality/data readiness will be reviewed in terms of consistency, completeness, and data collinearity within each individual data set; b) investigate and present how data can be supplemented with external data sets and/or integrated with other sources (e.g., explore if accessing summary statistics from NCCR for similar patient populations or enriching the data for specific individuals is possible through demographic location, etc.); c) explore the ability of these data sets (from different original cohorts) to complement or strengthen certain data quality areas; and d) define possible AI/ML pipelines and run through the data preparation phase for at least one pipeline to highlight benefits of the data aggregation and enrichment strategies outlined above. Our team will present our findings and next steps as part of the challenge, sharing the example scripts that could be reused by the community. We plan to assess and analyze datasets through the use of Python and R. The computing environment needed to carry out our project is TBD but may entail use of an NCI-funded cloud resource, local compute, or commercial cloud environments (e.g., AWS, Azure). Our project team from Booz Allen Hamilton includes Anna Fernandez, Abdullah Awaysheh, James Galbraith, Abegail Gill, Lucy Han, and Brandon Konkel.
{Empty}