NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #31

Submission information

Submission Number: 31

Submission ID: 145185

Submission UUID: bd7513b1-b636-465e-ad1e-b03b984ddb79

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=3DD25M0L7RXfcD7wDNG4_VVnvmT7M2aRicE5FnWwKf8

Created: Mon, 06/23/2025 - 17:34

Completed: Mon, 06/23/2025 - 17:34

Changed: Mon, 06/23/2025 - 17:34

Remote IP address: 10.208.28.103

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

Presenter Information

First Name Anna

Middle Initial T.

Last Name Fernandez

Degree(s) Ph.D.

Position/Title/Career Status Principal/Director

Organization Booz Allen Hamilton

Organization Address Bethesda, MD

Email fernandez_anna@bah.com

Other (Please Specify) gill_abegail@bah.com

Abstract Information

Abstract Category Building specific disease cohorts, development/refinement of AI/ML algorithms, and methods to enable interoperability

Abstract Keywords {Empty}

Abstract Title Practical Data Quality Assessment and Enrichment of Childhood Cancer Datasets

Abstract Summary Childhood cancer data is steadily increasing in prevalence and diversity, enabling development of artificial intelligence (AI) solutions with improved diagnostic accuracy and greater impact on pediatric patients. Our team will document the process, challenges, and opportunities in connecting disparate childhood cancer datasets (TBD) for a high-level use case to navigate through the following objectives: a) assess the AI readiness of at minimum two datasets individually that may be combined to more effectively solve a research problem- the data quality/data readiness will be reviewed in terms of consistency, completeness, and data collinearity within each individual data set; b) investigate and present how data can be supplemented with external data sets and/or integrated with other sources (e.g., explore if accessing summary statistics from NCCR for similar patient populations or enriching the data for specific individuals is possible through demographic location, etc.); c) explore the ability of these data sets (from different original cohorts) to complement or strengthen certain data quality areas; and d) define possible AI/ML pipelines and run through the data preparation phase for at least one pipeline to highlight benefits of the data aggregation and enrichment strategies outlined above. Our team will present our findings and next steps as part of the challenge, sharing the example scripts that could be reused by the community. We plan to assess and analyze datasets through the use of Python and R. The computing environment needed to carry out our project is TBD but may entail use of an NCI-funded cloud resource, local compute, or commercial cloud environments (e.g., AWS, Azure). Our project team from Booz Allen Hamilton includes Anna Fernandez, Abdullah Awaysheh, James Galbraith, Abegail Gill, Lucy Han, and Brandon Konkel.

Upload Abstract {Empty}