NCI Office of Data Sharing (ODS) Data Jamboree (Abstract Submissions): Submission #10
Submission information
Submission Number: 10
Submission ID: 144838
Submission UUID: d5ec5fcf-d282-4e8f-9160-55445413d085
Submission URI: /nci/ods-data-jamboree/abstractsubmissions
Submission Update: /nci/ods-data-jamboree/abstractsubmissions?token=_m3MzVhdNYLcU8-p1Idri4spUbIo4HYXn7bcwPIwkeE
Created: Mon, 06/16/2025 - 16:03
Completed: Mon, 06/16/2025 - 16:07
Changed: Mon, 06/16/2025 - 16:07
Remote IP address: 10.208.28.250
Submitted by: Anonymous
Language: English
Is draft: No
serial: '10' sid: '144838' uuid: d5ec5fcf-d282-4e8f-9160-55445413d085 uri: /nci/ods-data-jamboree/abstractsubmissions created: '1750104218' completed: '1750104452' changed: '1750104452' in_draft: '0' current_page: '' remote_addr: 10.208.28.250 uid: '0' langcode: en webform_id: nci_office_of_data_sharing_abstr entity_type: node entity_id: '2107' locked: '0' sticky: '0' notes: '' metatag: meta data: category: 'Methods to enable data interoperability' degree_s_: 'MS - Kinesiology ' email: jpoulos@confluent.io first_name: John keywords_abstracts: 'Data Streaming' last_name: Poulos middle_initial: '' organization: N/A organization_address: address: '' address_2: '' city: Rockville country: '' postal_code: '' state_province: '' other_please_specify_: '' summary: |- The National Cancer Institute’s Office of Data Sharing has identified data interoperability as essential to advancing cancer research and improving outcomes. Confluent Federal proposes the Confluent Platform, built on Apache Kafka®, as a real-time data streaming solution to unify fragmented cancer research data across the NIH ecosystem. Confluent enables event-driven interoperability, allowing data from systems such as The Cancer Genome Atlas (TCGA), SEER, the Cancer Imaging Archive (TCIA), and ClinicalTrials.gov to be integrated and shared in real time. Instead of relying on batch ETL pipelines or manual data reconciliation, institutes can publish and subscribe to continuous data streams—enabling faster, more collaborative research. For example, as genomic data is sequenced at an NCI center, it can be immediately streamed to researchers at NHGRI for analysis using AI-driven pipelines. Similarly, real-time monitoring of adverse event data from NCI’s clinical trials supports earlier detection of safety signals. Confluent’s architecture also supports federated learning, allowing NIH institutes to collaborate on machine learning models without centralizing sensitive data. The platform includes over 120 pre-built connectors to databases, APIs (including HL7 FHIR), cloud storage, and analytic tools such as Snowflake, SAS, and Databricks. It provides centralized schema management, access control, and end-to-end observability—ensuring compliance with FISMA, HIPAA, and NIH security standards. By providing a scalable, cloud-native backbone for streaming data, Confluent supports FAIR principles (Findable, Accessible, Interoperable, Reusable) and aligns with NIH’s strategic goals for data science. This approach breaks down silos, enhances collaboration across institutes, and enables a real-time data fabric that empowers the next generation of cancer research. title: '' ttile: 'Enabling Seamless Data Interoperability for Cancer Research Across NIH with Confluent Platform' upload_abstract: '65510'