NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #45

Submission information
Submission Number: 45
Submission ID: 145552
Submission UUID: f5ced573-5f22-461d-82c5-cf8bfb94eae6

Created: Thu, 06/26/2025 - 14:02
Completed: Thu, 06/26/2025 - 14:03
Changed: Thu, 06/26/2025 - 14:03

Remote IP address: 10.208.24.68
Submitted by: Anonymous
Language: English

Is draft: No
First Name Michael
Middle Initial
Last Name Watkins
Degree(s) Ph.D.
Position/Title/Career Status Manager of Data Standards and Modeling
Organization Data for the Common Good, University of Chicago
Organization Address Chicago
Email michaelwatkins@bsd.uchicago.edu
Other (Please Specify)
Abstract Category Methods to enable data interoperability
Abstract Keywords terminologies, semantics, knowledge graph, rdf, sparql
Abstract Title Developing an Oncology Knowledge Grap
Abstract Summary As data interoperability has risen to the forefront of clinical trial design and RWD capture, community awareness of clinical data standards has never been higher. Clinicians and data scientists alike understand that bespoke data modeling leads to complex and manual downstream harmonization. However, the resultant proliferation of clinical data standards does not fully realize data interoperability. There is a “last mile” need for computational approaches to data mapping and semantic reasoning that can leverage these standards to semi-automate the task of data harmonization.

Perhaps the most difficult aspect of interoperating over data bound to different terminological standards is that concepts are rarely exact matches and are usually partially equivalent in an ill-defined way. Knowledge graphs are a mainstay for semantic reasoning in many other industries and consist of concepts (nodes) and relations (edges). By encoding these concepts and relations in a graph representation, such as the Resource Description Framework (RDF), a reasoning language like SPARQL can query this knowledge graph and provide a user with a precise and computational relationship between two concepts.

Aims:
1. Curate a set of RDF triples that encode the relationships between oncology-related concepts from NCIt, SNOMED-CT, ICD-O, Disease Ontology, and Uberon (scoped by specific use cases).
2. Combine those sets into a small but linked proof-of-concept knowledge graph.
3. Develop SPARQL queries that can access the knowledge graph for given clinical terms.
4. Instantiate those queries within a data mapping demo that takes in C3DC data and annotates it with additional concept bindings from the knowledge graph.
Upload Abstract
Proposal.pdf42.12 KB