NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #29

Submission information
Submission Number: 29
Submission ID: 145182
Submission UUID: 6db1c14f-1843-455d-8af4-68fb36628733

Created: Mon, 06/23/2025 - 17:04
Completed: Mon, 06/23/2025 - 17:05
Changed: Mon, 06/23/2025 - 17:05

Remote IP address: 10.208.24.253
Submitted by: Anonymous
Language: English

Is draft: No
Presenter Information
Yin
{Empty}
Lu
Ph.D.
Lead Bioinformatics Analyst
ICF
Rockville
{Empty}
Abstract Information
Methods to enable data interoperability
Pediatric cancer, AI-readiness, Cohort refinement, Biomedical data quality
CC-CARE-AI: A Framework for Assessing and Refining AI-Readiness of Childhood Cancer Cohorts from Kids First, TARGET, and CCDI
While large-scale childhood cancer datasets are increasingly available, researchers often struggle to determine which cohorts are suitable for AI and machine learning applications due to inconsistencies in data quality, completeness, and standardization. This project proposes a modular framework CC-CARE-AI (Childhood Cancer Cohort Assessment and REfinement for AI) to assess and refine the AI-readiness of childhood cancer cohorts from the Gabriella Miller Kids First Program (Kids First), TARGET, and the Childhood Cancer Data Initiative (CCDI). CC-CARE-AI generates domain-specific readiness scores across clinical, genomic, and imaging data using a transparent, multi-criteria evaluation system. To complement the framework, it also incorporates tools for cohort refinement and decision support through interactive visualizations and dashboards, enabling researchers to identify high-quality subsets and enhance data usability. By aligning data quality with specific research and machine learning needs, the framework facilitates more effective and responsible use of AI in pediatric oncology.
The project will use Python, R, and tools like pandas and Streamlit, etc. on the Seven Bridges Cancer Genomics Cloud for secure, scalable, and reproducible analysis.

The project team will be lead by Dr. Yin Lu (Lead Bioinformatics Analyst) and includes Mr. Alexander Pilozzi (Bioinformatics Analyst), and Dr. Alejandro M. Sevillano (Bioinformatics Analyst) from the Health Analytics and Research Technologies division at ICF, with expertise in cancer data management, AI-readiness assessment, and cloud-based analysis. The team brings relevant experience from the CPTAC program, NIDDK Data Centric Challenge, ARPA-H Biomedical Data Fabric, and CRDC integration efforts.