NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #38

Submission information

Submission Number: 38

Submission ID: 145199

Submission UUID: c38fef55-03be-4fee-ac98-0be577fccbff

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=d4puBkPDz-dZmkIn4YhFnNoTiQNCoyiolewewt-_tUo

Created: Mon, 06/23/2025 - 20:26

Completed: Mon, 06/23/2025 - 20:28

Changed: Mon, 06/23/2025 - 20:28

Remote IP address: 10.208.24.253

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

Presenter Information

First Name Ariana

Middle Initial M

Last Name Familiar

Degree(s) PhD

Position/Title/Career Status Supervisory Data Scientist, Center for Data-Driven Discovery in Biomedicine

Organization Children's Hospital of Philadelphia

Organization Address PHILADELPHIA

Email familiara@chop.edu

Other (Please Specify) {Empty}

Abstract Information

Abstract Category Development or refinement of analysis pipelines or AI/ML algorithms

Abstract Keywords {Empty}

Abstract Title Automated Quality Control and Stain Classification of Whole-Slide Images in Pediatric Brain Tumors: Developing Scalable Harmonization Methods for the Children’s Brain Tumor Network Dataset

Abstract Summary High-resolution whole-slide images (WSIs) are increasingly central to pediatric brain tumor research, yet large-scale quality control (QC) and metadata curation remain persistent bottlenecks. As part of the Kids First program, the Children's Brain Tumor Network (CBTN) repository provides a large dataset of WSIs across pediatric brain tumor histopathologies (2,277 patients, 2,620 tumor samples, 19,176 WSIs). Given this collection consists of WSIs acquired through clinical protocols, available stain types across samples can differ due to their diagnostic cohort and thus the clinical relevance of specific stain markers. WSIs can also exhibit considerable variability in tissue quality and digitization artifacts, often without reliable annotations. These inconsistencies limit downstream applications in computational pathology and multi-modal integration. We propose testing unsupervised and supervised machine learning methods to address two critical challenges: (1) automated detection of poor-quality or outlier WSIs, and (2) classification of stain type (e.g., H&E, Ki-67, GFAP). Leveraging dimensionality reduction and patch-level feature extraction via pretrained convolutional neural networks (e.g., ResNet, CLIP, or foundational digital pathology models), we will cluster WSIs or tile patches into quality- and stain-coherent groups. Clustering results will be validated against available metadata, expert annotation, and slide-level inspection. Our approach provides a scalable, annotation-light method to improve data hygiene in CBTN’s extensive pathology archive. By identifying low-quality or mislabeled images and surfacing underrepresented staining types, this project supports more reliable use of CBTN pathology data in downstream machine learning pipelines and biomarker discovery

Upload Abstract

Pathology harmonization abstract.docx15.13 KB