NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #13
Submission information
Submission Number: 13
Submission ID: 144982
Submission UUID: 0e58dbb7-da4e-4de6-9a1e-f632ba432d75
Submission URI: /nci/ods-data-jamboree/abstractsubmissions
Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=w85KOybCTK-qaO5_pFjKGe56doi1wpaoMBNlDEshfT4
Created: Wed, 06/18/2025 - 14:02
Completed: Wed, 06/18/2025 - 14:03
Changed: Wed, 06/18/2025 - 14:03
Remote IP address: 10.208.28.250
Submitted by: Anonymous
Language: English
Is draft: No
serial: '13' sid: '144982' uuid: 0e58dbb7-da4e-4de6-9a1e-f632ba432d75 uri: /nci/ods-data-jamboree/abstractsubmissions created: '1750269770' completed: '1750269788' changed: '1750269788' in_draft: '0' current_page: '' remote_addr: 10.208.28.250 uid: '0' langcode: en webform_id: nci_office_of_data_sharing_abstr entity_type: node entity_id: '2107' locked: '0' sticky: '0' notes: '' metatag: meta data: category: 'Development or refinement of analysis pipelines or AI/ML algorithms' degree_s_: Ph.D. email: yhu@mail.nih.gov first_name: Ying keywords_abstracts: 'tumor subtype, machine learning, gene interaction' last_name: Hu middle_initial: '' organization: CGBB/CBIIT/NCI/NIH/HHS organization_address: address: '' address_2: '' city: 'Rockville, MD' country: '' postal_code: '' state_province: '' other_please_specify_: '' summary: | Background Accurate classification of tumor subtypes is pivotal for precision oncology, yet existing approaches rarely integrate multiple heterogeneous data types or explicitly model gene–gene interactions. Recent advances in machine learning (ML) now enable joint analysis of multi-omics profiles and features to uncover robust, biologically interpretable biomarkers. Objective This project will develop an integrated ML framework that: 1. Predicts tumor subtypes from combined RNA-seq (transcriptomic) and whole slide images (WSIs) data. 2. Identifies key genes and gene–gene interactions that drive distinctions between subtypes. 3. Elucidates functional roles of these genes and interactions through gene-set enrichment and network-diffusion analyses. Methods Data Harmonization RNA-seq counts will be normalized and batch-corrected, while WSIs will be pre-processed and summarized into quantitative radiomic features. Predictive Modeling Six complementary classifiers—glmnet, k-nearest neighbors, naïve Bayes, random forest, linear SVM, and XGBoost—will be trained with cross-validation, and their ensemble performance will be evaluated for subtype prediction. Feature Selection & Interaction Mining Feature selection coupled with stability selection will identify candidate subtype-associated genes. The vivid R package will detect synergistic gene pairs whose interactions significantly improve classification. Functional Interpretation Candidate genes and gene pairs will undergo gene-set enrichment analysis (GSEA) using Gene Ontology, KEGG, Hallmark, and Reactome sets. Protein–protein-interaction (PPI) network diffusion on Reactome and NeST networks will then reveal higher-order functional modules. Network Construction Significant genes and interactions will be integrated into a directed, interpretable gene network that highlights putative regulatory cascades distinguishing tumor subtypes. Expected Outcomes • A rigorously benchmarked multimodal ML pipeline for accurate tumor-subtype prediction. • A ranked list of subtype-defining genes and gene–gene interactions with robust statistical support. • An interactive gene-network visualization to guide experimental validation and therapeutic-target discovery. title: '' ttile: 'Translating Multi-Modal Machine-Learning Insights into Clinically Actionable Subtype-Specific Biomarkers with an Emphasis on Biomarker Interactions' upload_abstract: '65537'