NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #41

National Cancer Institute

Submission information

Submission Number: 41

Submission ID: 145273

Submission UUID: e3c93020-4ccd-4e9c-87e2-7106df727322

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=xE-rkhWZsA-7HzxgA78uRlhOCHheJQjOtgL9eouAwgg

Created: Tue, 06/24/2025 - 11:46

Completed: Tue, 06/24/2025 - 11:47

Changed: Wed, 06/25/2025 - 10:36

Remote IP address: 10.208.28.5

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)






Presenter Information
---------------------






First Name: Minghong









Middle Initial: {Empty}









Last Name: Ward









Degree(s): MS. Electrical and Computer Engineering









Position/Title/Career Status: Product Owner for dbGaP FHIR Product









Organization: NLM/NCBI









Organization Address:
BETHESDA










Email: wardming@nih.gov









Other (Please Specify): {Empty}













Abstract Information
--------------------






Abstract Category: Methods to enable data interoperability









Abstract Keywords: dbGaP, FHIR, DRS, Cloud-computing, Interoperability









Abstract Title: Making dbGaP data interoperable and analysis-ready with FHIR and DRS API









Abstract Summary:
The NIH’s database of Genotypes and Phenotypes (dbGaP) includes data from 3,000+ studies across 800 diseases/focuses, involving 4 million participants and 450,000+ phenotype variables. dbGaP has supported over 8,000 publications.  As researchers increasingly rely on cloud platforms to conduct cross-study analysis, both interoperability and ease of access are urgently needed. We developed FHIR (Fast Healthcare Interoperability Resources) API for dbGaP to deliver both open-access and controlled-access dbGaP data via FHIR. The open-access API provides programmatic access to study metadata, enabling researchers to discover relevant datasets for data discovery. Controlled-access API deliver over 1.1 billion phenotypic observations and molecular sequence files through persistent URLs using the GA4GH Data Repository Service (DRS), another global standard. We will show how a simple Python script in a Jupyter notebook can perform phenotype-driven statistical analysis across multiple datasets and repositories using the FHIR API. This approach enhances data reuse, facilitates cohort building, and helps accelerate reproducible research at scale.










Upload Abstract: https://events.cancer.gov/sites/default/files/webform/nci_office_of_data_sharing_abstr/145273/KidsFirst_ODS_poster_June2025.docx