NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #39

Submission information
Submission Number: 39
Submission ID: 145205
Submission UUID: d2ea99fe-89d5-4b26-97a9-8124e6b09e8a

Created: Mon, 06/23/2025 - 23:31
Completed: Mon, 06/23/2025 - 23:31
Changed: Mon, 06/23/2025 - 23:31

Remote IP address: 10.208.28.57
Submitted by: Anonymous
Language: English

Is draft: No
Presenter Information
Weiping
{Empty}
Ma
Ph.D.
Data Scientist
Icahn school of medicine at Mount Sinai
New York
{Empty}
Abstract Information
Development or refinement of analysis pipelines or AI/ML algorithms
missing value, imputation, DIA, TMT
Missing data Imputation on proteomics data from DIA experiment
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. It has gained significant attention in proteomics research recently. Comparing with another popular MS based technique: labeled proteomics experiments such as TMT (Tandem Mass Tag), DIA data has its advantage on effectively measuring low-abundance proteins and quantitative accuracy. On the other hand, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values.
In this project, we propose to perform benchmark evaluations for imputing NA values in DIA data sets. We will use the DIA and TMT global proteomics data of 441 pediatric brain tumor samples across 9 histology from the Kids First pediatric brain tumor study. Specifically, we will evaluate the imputation performance on the DIA data, using the matched TMT datasets as golden standards. We will include multiple commonly used imputation methods in the evaluation, such as KNN, machine learning based imputation methods, low-rank matrix completion techniques, deep learning models. We will also bench-mark our previous work, DreamAI, an ensemble-based imputation methods, which has been successfully applied in numerous CPTAC studies. Additionally, we will also evaluate the impact of imputation accuracy on downstream statistical analysis, such as association and pathway enrichment analysis. Furthermore, we will investigate novel approaches to jointly impute matched DIA and DDA (TMT) data of the same sample to enhance protein coverage.
{Empty}