NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #41

National Cancer Institute

Submission information

Submission Number: 41

Submission ID: 145273

Submission UUID: e3c93020-4ccd-4e9c-87e2-7106df727322

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=xE-rkhWZsA-7HzxgA78uRlhOCHheJQjOtgL9eouAwgg

Created: Tue, 06/24/2025 - 11:46

Completed: Tue, 06/24/2025 - 11:47

Changed: Wed, 06/25/2025 - 10:36

Remote IP address: 10.208.28.5

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)

Presenter Information

First Name Minghong

Middle Initial {Empty}

Last Name Ward

Degree(s) MS. Electrical and Computer Engineering

Position/Title/Career Status Product Owner for dbGaP FHIR Product

Organization NLM/NCBI

Organization Address BETHESDA

Email wardming@nih.gov

Other (Please Specify) {Empty}

Abstract Information

Abstract Category Methods to enable data interoperability

Abstract Keywords dbGaP, FHIR, DRS, Cloud-computing, Interoperability

Abstract Title Making dbGaP data interoperable and analysis-ready with FHIR and DRS API

Abstract Summary The NIH’s database of Genotypes and Phenotypes (dbGaP) includes data from 3,000+ studies across 800 diseases/focuses, involving 4 million participants and 450,000+ phenotype variables. dbGaP has supported over 8,000 publications. As researchers increasingly rely on cloud platforms to conduct cross-study analysis, both interoperability and ease of access are urgently needed. We developed FHIR (Fast Healthcare Interoperability Resources) API for dbGaP to deliver both open-access and controlled-access dbGaP data via FHIR. The open-access API provides programmatic access to study metadata, enabling researchers to discover relevant datasets for data discovery. Controlled-access API deliver over 1.1 billion phenotypic observations and molecular sequence files through persistent URLs using the GA4GH Data Repository Service (DRS), another global standard. We will show how a simple Python script in a Jupyter notebook can perform phenotype-driven statistical analysis across multiple datasets and repositories using the FHIR API. This approach enhances data reuse, facilitates cohort building, and helps accelerate reproducible research at scale.

Upload Abstract

KidsFirst_ODS_poster_June2025.docx14.38 KB