NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #39

Submission information

Submission Number: 39

Submission ID: 145205

Submission UUID: d2ea99fe-89d5-4b26-97a9-8124e6b09e8a

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=ZP4z8fLVnQp6GeRb7I0_HUHU5znUeFPAjL6rEu6N20I

Created: Mon, 06/23/2025 - 23:31

Completed: Mon, 06/23/2025 - 23:31

Changed: Mon, 06/23/2025 - 23:31

Remote IP address: 10.208.28.57

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

Presenter Information

First Name Weiping

Middle Initial {Empty}

Last Name Ma

Degree(s) Ph.D.

Position/Title/Career Status Data Scientist

Organization Icahn school of medicine at Mount Sinai

Organization Address New York

Email weiping.ma@mssm.edu

Other (Please Specify) {Empty}

Abstract Information

Abstract Category Development or refinement of analysis pipelines or AI/ML algorithms

Abstract Keywords missing value, imputation, DIA, TMT

Abstract Title Missing data Imputation on proteomics data from DIA experiment

Abstract Summary Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. It has gained significant attention in proteomics research recently. Comparing with another popular MS based technique: labeled proteomics experiments such as TMT (Tandem Mass Tag), DIA data has its advantage on effectively measuring low-abundance proteins and quantitative accuracy. On the other hand, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values.
In this project, we propose to perform benchmark evaluations for imputing NA values in DIA data sets. We will use the DIA and TMT global proteomics data of 441 pediatric brain tumor samples across 9 histology from the Kids First pediatric brain tumor study. Specifically, we will evaluate the imputation performance on the DIA data, using the matched TMT datasets as golden standards. We will include multiple commonly used imputation methods in the evaluation, such as KNN, machine learning based imputation methods, low-rank matrix completion techniques, deep learning models. We will also bench-mark our previous work, DreamAI, an ensemble-based imputation methods, which has been successfully applied in numerous CPTAC studies. Additionally, we will also evaluate the impact of imputation accuracy on downstream statistical analysis, such as association and pathway enrichment analysis. Furthermore, we will investigate novel approaches to jointly impute matched DIA and DDA (TMT) data of the same sample to enhance protein coverage.

Upload Abstract {Empty}