NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #4

Submission information

Submission Number: 4

Submission ID: 143654

Submission UUID: a60f5558-e450-482f-9bc7-a23557155cd4

Submission URI: /nci/ods-data-jamboree/abstractsubmissions

Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=9ceumUvzosMAMu0keAwHv8lTsgm1mkusLUPy84y-Uyw

Created: Thu, 05/22/2025 - 13:19

Completed: Thu, 05/22/2025 - 13:19

Changed: Thu, 05/22/2025 - 13:19

Remote IP address: 10.208.28.130

Submitted by: Anonymous

Language: English

Is draft: No

Webform: NCI Office of Data Sharing (ODS) Data Jamboree-Abstract Submissions

Submitted to: NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions)






Presenter Information
---------------------






First Name: David









Middle Initial: {Empty}









Last Name: Higgins









Degree(s): Ph.D.









Position/Title/Career Status: Informatics Program Manager









Organization: Children's Hospital of Philadelphia









Organization Address:
Philadelphia, PA










Email: higginsd@chop.edu









Other (Please Specify): {Empty}













Abstract Information
--------------------






Abstract Category: Development of tutorial and educational tools, data storytelling, infographics, and other creative uses of data 









Abstract Keywords: variant discovery, jupyter notebook, pyspark sql, genomics, training









Abstract Title: Tutorial and Example Notebooks for the Kids First Variant Workbench









Abstract Summary:
The goal of the Gabriella Miller Kids First Pediatric Research Program is to find common underlying genetic causes of pediatric cancer and structural birth defects. The Gabriella Miller Kids First Data Resource Center (Kids First DRC) produces high quality clinical and genomic datasets to support this goal, accessible and analyzable via our interoperable cloud platforms.

The Kids First Variant Workbench powered by CAVATICA accelerates breakthroughs in pediatric medicine by combining Kids First participant conditions, genomic variants, and variant annotations. With each of these tools at their disposal, the Variant Workbench provides one workspace for researchers to make discoveries using Kids First datasets they have received access to, enabling them to accelerate variant discovery.

On a technical level, the Variant Workbench is a series of tables containing information about Kids First participants, their genomic variants, and annotations such as ClinVar and CADD to put these in context. These tables can be joined together using PySpark SQL to isolate the specific fields of interest in the more than 400 million unique variants in the Kids First cohorts. The latest release contains germline and somatic variants from 9 cohorts, a total of ~3,900 participants. It is possible to query, analyze and display both types of variants at the same time for these studies.

The goal of this project is to develop tutorial and educational tools to increase usage of the Variant Workbench. We have a specific interest in the development of written directions for integrating other datasets into the Variant Workbench such as TARGET as well as example Jupyter notebooks that can be executed by users to show the platform’s capabilities. Overall though, these tools could take many formats and we are looking forward to working with members of the research community to hear their feedback and ideas as well.










Upload Abstract: {Empty}