NCI Data Jamboree (Project Abstract Submission)

64 submissions

#	Starred	Locked	Notes	Created	User	IP address	First Name	Middle Initial	Last Name	Degree(s)	Position/Title/Career Status	Organization	Organization Address	Email	List of Additional Authors	Abstract Category	Abstract Keywords	Abstract Title	Abstract	Operations
67	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #67	Unlock NCI Data Jamboree (Project Abstract Submission): Submission #67	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #67	Tue, 07/28/2026 - 12:20	bojae	10.208.24.192	Yuanwei Bay		Xu	PhD, MSE	Postdoc Researcher	Johns Hopkins University	Baltimore, Maryland	yxu105@jh.edu		Developing, refining, or validating tools, methods, algorithms, and pipelines	Spatial proteomics, CRDC, imaging	An Automated Computational Workflow for Co-Registering Spatially-resolved Proteomics with Public Reference Atlases	Spatial proteomics technologies generate rich microenvironmental data, yet critical barriers prevent their integration into public multi-omic repositories. This project will develop an open-source computational workflow to harmonize custom grid-based spatial proteomic datasets with standardized reference atlases and multi-modal imaging archives, enabling seamless data sharing within the NCI Cancer Research Data Commons (CRDC) ecosystem. Specific Aim 1: Implement Image Registration Algorithms for Spatial Coordinate Harmonization. We will develop computational methods to transform SPOTTER-derived spatial proteomic data from horizontal mouse brain sections into standardized coordinate systems, specifically the Allen Mouse Brain Atlas Common Coordinate Framework (CCFv3) and H&E-stained histology references. Using image registration libraries (Valis, SimpleITK), we will generate validated transformation matrices, establish annotation transfer protocols from atlas regions to proteomic measurements, and define quality control metrics for alignment accuracy. This aim addresses the fundamental technical challenge of bridging experimental coordinate spaces with canonical anatomical frameworks. Specific Aim 2: We will package the registration tools into a cloud-based, containerized workflow that merges spatial proteomics with complementary data modalities and prepares CRDC-compatible submissions. The pipeline will accept custom proteomic datasets, execute automated coordinate alignment, overlay spatial transcriptomics from 10x Visium/HTAN repositories, incorporate whole-slide DICOM pathology from NCI Imaging Data Commons, and output standardized data objects using community frameworks (such as SpatialExperiment, Scanpy, Seurat v5). Comprehensive documentation will enable plug-and-play adoption across platforms. Expected Outcomes & Impact: This collaborative project will deliver an end-to-end workflow democratizing spatial proteomics data sharing, accelerating multi-omic cancer research through standardized harmonization procedures. All code and example datasets will be released as open source.	View
63	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #63	Lock NCI Data Jamboree (Project Abstract Submission): Submission #63	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #63	Mon, 07/27/2026 - 23:11	Anonymous	10.208.24.192	Avani		Patel	N.A.		Animal Genome Institute	Palmyra	info@animalgenomeinstitute.org	First Name: Yuka Last Name: Imamura Affiliation: Animal Genome Institute	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data		Cross-Species Signature Scoring and Risk Assessment Modeling for Human and Canine Osteosarcoma	Osteosarcoma is a genetically complex malignancy. While human and canine osteosarcomas share significant molecular similarities, integrating datasets across species and sequencing platforms remains a technical challenge. Overcoming these hurdles is crucial for identifying conserved oncological targets and improving the AI-readiness of comparative genomic data for the broader research community. We aim to develop and validate a robust computational framework for cross-species data integration, specifically focusing on continuous signature scoring models for human and canine osteosarcoma. By utilizing publicly available human osteosarcoma datasets (such as TARGET-OS) and public canine cohorts (such as NCI's DOG² cohort), we will employ expression data scaling techniques, cross-platform normalization, and continuous signature scoring algorithms to harmonize the disparate matrices. Ultimately, we aim to leverage these validated models to develop a cross-species risk assessment application to support clinical decision-making.	View
62	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #62	Lock NCI Data Jamboree (Project Abstract Submission): Submission #62	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #62	Mon, 07/27/2026 - 22:56	Anonymous	10.208.24.192	Yi		Hsiao	Ph.D.	Research Fellow	University of Michigan	Ann Arbor	yihsiao@umich.edu	First Name: Alexey Last Name: Nesvizhskii Post-nominal letters: Ph.D. Affiliation: University of Michigan First Name: Marcin Last Name: Cieslik Post-nominal letters: Ph.D. Affiliation: University of Michigan	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	Leukemia; AML; multi-omics; biomarker discovery; drug target nomination	Building a Harmonized Multi-Omics Resource and Web Application for Biomarker Discovery and Drug Target Nomination in Leukemia	Leukemia is highly heterogeneous across genomic, epigenomic, proteomic, metabolic, and clinical dimensions, creating challenges and opportunities for biomarker discovery and therapeutic target nomination. This project aims to develop a harmonized multi-omics resource and web application integrating publicly available leukemia cohorts, using acute myeloid leukemia (AML) as the primary use case. During the jamboree, we will identify and prioritize relevant leukemia datasets from resources such as CPTAC, TCGA, TARGET, Beat AML and related public repositories. We will initially focus on datasets identified before the data jamboree (more than 10) including CPTAC adult AML datasets and recent CPTAC–Kids First collaboration datasets on pediatric AML and T-cell acute lymphoblastic leukemia. Data modalities of interest include clinical annotations, bulk genomics, transcriptomics, epigenomics (including DNA methylation and ATAC-seq), proteomics, post-translational modifications such as phosphorylation, metabolomics, lipidomics, drug screening, and gene dependencies. We will use AI-assisted extraction to define unified study- and sample-level annotations and prepare standardized datasets using controlled vocabularies, including OncoTree/NCI Thesaurus for disease classification and HGNC for molecular identifiers. In parallel, we will create a user-friendly web application for exploring cohorts, querying molecular features and candidate targets across omics layers, visualizing feature distributions, and assessing associations with clinical variables such as mutation, subtype, treatment response, relapse, and survival. Because the jamboree phase will focus on data inventory, harmonization, lightweight analysis, and implementing analyses, standard personal computers with internet access, AI-enabled tools, and open-source software will be sufficient. Key scientific questions include: Which molecular features are associated with leukemia subtype, treatment response, or survival? Can integrated multi-omics profiles nominate actionable biomarkers, dysregulated pathways, or drug targets not apparent from single-omic analyses? What harmonization barriers limit cross-cohort leukemia biomarker and target discovery? The expected outcome is a reusable harmonized resource, documentation, and web application for community-driven leukemia translational discovery.	View
64	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #64	Lock NCI Data Jamboree (Project Abstract Submission): Submission #64	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #64	Mon, 07/27/2026 - 21:57	Anonymous	10.208.24.192	Jennifer	M	Torres Del Valle	PhD, MPH, MA	Postdoctoral NIH Scholar in Clinical and Translational Research	Postdoctoral Master of Science in Clinical and Translational Research Program, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico	San Juan, Puerto Rico	jennifer.torres@upr.edu	First Name: Claudia Last Name: Amaya Ardila Post-nominal letters: EdD, MPH, MS Affiliation: Department of Biostatistics and Epidemiology, Graduate School of Public Health; and Master of Science in Clinical and Translational Research Program, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico First Name: Souhail Last Name: Malave Rivera Post-nominal letters: PhD, MSc Affiliation: Department of Social Sciences, Graduate School of Public Health; and Master of Science in Clinical and Translational Research Program, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico First Name: Maritere Last Name: Meléndez Rosario Post-nominal letters: DPT, MPH, MS Affiliation: Department of Biostatistics and Epidemiology, Graduate School of Public Health, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico First Name: Andrea Last Name: AngaritaValderrama Post-nominal letters: MPH Affiliation: Department of Biostatistics and Epidemiology, Graduate School of Public Health, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico	Evaluating data quality for reproducibility and AI-readiness	HPV-associated cancers, adolescents and young adults, conditional relative survival, cancer survivorship, SEER Research Plus	Survival After HPV-Associated Cancer in Adolescents and Young Adults: Changes With Time Since Diagnosis	Background: Adolescents and young adults (AYAs) have distinct cancer survivorship needs. Estimates measured from diagnosis do not reveal whether survival differences across human papillomavirus (HPV)-associated cancer sites persist as patients survive additional years. These patterns could inform survivorship research, risk communication, and care planning. Objective: To determine whether differences in five-year conditional relative survival across HPV-associated cancer sites and stages persist, narrow, or widen after patients survive one, three, and five years. Approach: The team will conduct a population-based retrospective cohort study using SEER Research Plus Data, 21 Registries. The cohort will include patients aged 15–39 years with a first primary invasive, registry-defined HPV-associated cancer diagnosed during 2000–2013 and followed through December 31, 2023. Established SEER site and histology classifications will define cervical carcinoma, noncervical anogenital squamous cell carcinoma, and oropharyngeal squamous cell carcinoma. Follow-up will extend from diagnosis to death, last contact, or study closure. SEER*Stat will estimate five- and ten-year relative survival and five-year conditional relative survival at diagnosis and among patients surviving one, three, and five years. Absolute differences and relative contrasts will be evaluated across cancer-site groups, harmonized SEER Summary Stage, diagnostic periods, and survival landmarks. Using institutional computing resources, R will support data management and observed-survival analyses with Kaplan–Meier curves, log-rank tests, and multivariable Cox models adjusted for site, stage, age, sex, and diagnostic period. Diagnostics and sensitivity analyses will address proportional hazards, unknown stage, data completeness, and unstable estimates. Before the Jamboree, the team will finalize cohort specifications, confirm data access and variables, and assess cell sizes and events. During the Jamboree, the team will implement a reproducible workflow, generate preliminary estimates and visualizations, evaluate limitations, and produce reusable cohort specifications and analysis code. A multidisciplinary team with expertise in epidemiology, biostatistics, HPV-related cancer prevention, and clinical and translational research has been assembled.	View
61	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #61	Lock NCI Data Jamboree (Project Abstract Submission): Submission #61	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #61	Mon, 07/27/2026 - 21:28	Anonymous	10.208.24.192	Candice Francheska		Tambaoan	MPH	Early Career Researcher	N/A	Malden, MA	cfbt@bu.edu		Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	Real World Data, Genomics, Biomarkers, Disease Monitoring	Applying as a Project Seeker	Experience and Expertise: - My background is in molecular biology, genetics, epidemiology, and biostatistics - I can code in R, SAS, and Python. Can perform longitudinal data analysis, survival analysis, regression models, among others. - Can wrangle and analyze with real world (e.g. Flatiron, SEER, EHR, TCGA) and clinical trial data. - Worked as a data scientist for an NGS cancer diagnostic company, mainly focusing on the applications of liquid biopsy in treatment response monitoring. Also worked on the application of comprehensive genomic profiling data in patient selection, resistance monitoring, and biomarker development. Why I want to participate in the jamboree: - I would like to join this event because it offers a unique chance to work directly on cancer datasets alongside experienced researchers. It is also a good opportunity to learn from my fellow researchers, especially those coming from different backgrounds. Lastly, I believe that this event can lead to projects that can make substantial impact on patient lives. What I hope to achieve: - I hope to join a team where I can contribute my analytical skills and expertise meaningfully. - I hope to expand my exposure to other data types and repositories - I hope to build connections within the research community and explore whether the work can lead to a publication or can extend into an on-going collaboration beyond the event itself.	View
60	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #60	Lock NCI Data Jamboree (Project Abstract Submission): Submission #60	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #60	Mon, 07/27/2026 - 20:59	Anonymous	10.208.24.192	Trevor	R	Michelson	M.S.	Data Scientisti	BVARI	Boston	Trevor.Michelson@va.gov	First Name: Alice Last Name: Kwak Post-nominal letters: PhD Affiliation: Department of Veterans Afffairs First Name: Nicholas Last Name: Lemmer Post-nominal letters: MS Affiliation: Department of Veterans Affairs First Name: Nithin Last Name: Weerasinghe Post-nominal letters: MS Affiliation: Department of Veterans Affairs First Name: Stephan Last Name: Foianini Post-nominal letters: PhD Affiliation: Department of Veterans Affairs	Enhancing data interoperability (e.g., data harmonization, data federation)	Precision Oncology, Federated Data Integration, VA EHR, Multi-Modal Oncology Data	Integrating VA Precision Oncology Data with NCI Public Resources for Large-Scale Federated Cancer Research	A crucial challenge facing oncological data analysis is the lack of available data sets that have large enough patient cohorts for gaining insights into specific cancer types. To meet this particular challenge, we propose to leverage the US Department of Veterans Affairs (VA) large-scale electronic patient health record that contains millions of patients spanning over several decades. We plan to develop solutions for integrating VA data with other public data sets available through the NCI, resulting in a large-scale longitudinal multi-modal oncology data set accessible to the larger research community. The Precision Oncology Data Repository (PODR) is a VA data set that aggregates, curates, and shares clinical, imaging, and genomic data from various VA and external partner sources. Specifically, the VA has aggregated a cohort of over 140k decedent oncology patients that includes various cancer types with regulatory approval to de-identify and share with trusted external partners. Our goal is to develop various methods for utilizing PODR with other resources to demonstrate data analysis within a federated infrastructure. The following NCI public data sets are planned to be used, but not limited to: - Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial Data - Lung - Colorectal - Questionnaires - Cancer Epidemiology Descriptive Cohort Database (CEDCD) - Cancer Prevention Study II Nutrition Cohort (CPS-II Nutrition) - Millenium Cohort Study Panels 1 – 5 - SEER Research - Prostate Cancer with Decipher Prostate Genomic Classifier Database - All of Us Research (AoU) Program The project will require members with experience in ETL processes, healthcare data and analysis, and using version control systems to help facilitate collaborative development. Potential to explore NLP-derived concept code extraction through clinical notes and surveys. We will make use of VA internal workspaces (e.g., VINCI, ARCHES) as well as test out the Bridges Cancer Genomics Cloud.	View
59	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #59	Lock NCI Data Jamboree (Project Abstract Submission): Submission #59	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #59	Mon, 07/27/2026 - 18:57	Anonymous	10.208.28.116	Rawan		Elshobaky	B.S. Molecular and Cellular Biology	PhD student	University of Colorado Anschutz	Aurora, Colorado	rawan.elshobaky@cuanschutz.edu	First Name: . Last Name: . Affiliation: .	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	multi-omics, risk prediction, precision medicine, computational genomics	Integrating Multimodal Data for Improving Risk Prediction	I am a Ph.D. student in Human Medical Genomics at the University of Colorado Anschutz. I earned my B.S. in Molecular and Cellular Biology from Johns Hopkins with minors in Computer Science and Bioethics. My doctoral research, under the mentorship of Drs. David Conti and Milton Pividori, focuses on developing computational methods to improve the performance, interpretability, and portability of polygenic risk scores (PRS) across populations. Specifically, I use data from the UK Biobank to develop and evaluate risk scores based on predicted gene expression data and utilize functional information with Bayesian fine-mapping methods to prioritize likely causal variants and interpret genetic associations. I am excited to participate in the NCI Data Jamboree because its team-based, interdisciplinary format provides a unique opportunity to work alongside researchers with complementary expertise to address challenges in cancer data integration and analysis. My research is motivated by the need to develop computational methods that perform reliably across populations, recognizing that robust genetic prediction depends on developing and evaluating methods using datasets that are representative of the populations they are intended to serve. Therefore, I am particularly interested in projects involving the All of Us Research Program, because its large, broadly representative participant cohort, combined with genomic, multi-omic, environmental, and longitudinal clinical data, provides a unique opportunity to evaluate methods that generalize robustly across populations. Through the jamboree, I hope to gain practical experience in effectively accessing and leveraging large-scale cancer resources, such as All of Us, and contribute to incorporating multimodal data into reproducible computational workflows. I also hope to bring this expertise back to the University of Colorado Anschutz Department of Biomedical Informatics and help facilitate the adoption of these resources in future collaborative research. Finally, I look forward to building lasting collaborations with other participants and continuing these interactions beyond the jamboree.	View
58	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #58	Lock NCI Data Jamboree (Project Abstract Submission): Submission #58	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #58	Mon, 07/27/2026 - 17:28	Anonymous	10.208.28.116	Calvin		Tribby	PhD, MPH		City of Hope Comprehensive Cancer Center	Duarte, CA	ctribby@coh.org	First Name: Marta Last Name: Jankowska Post-nominal letters: PhD Affiliation: City of Hope Comprehensive Cancer Center First Name: Jiue-An Last Name: Yang Post-nominal letters: PhD Affiliation: City of Hope Comprehensive Cancer Center	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	geospatial decision support; interactive visualization; spatial epidemiology; community engagement	Cancer Screening Priority Mapper: A Flexible Multi-Scale Geospatial Decision Support Tool for Identifying High-Priority Cancer Screening Areas	Cancer centers, public health agencies, and community organizations often use different criteria to identify areas with the greatest need for cancer screening interventions. However, there is currently no standardized, flexible tool that allows users to compare how priority areas change when different screening, cancer burden, healthcare access, and sociodemographic indicators are emphasized. This project will develop a publicly accessible web mapping application that integrates county-level measures of cancer burden with tract-level measures of screening barriers and access to care to visualize high-priority areas for screen-detectable cancers across the United States. The project aligns with the NCI Data Jamboree goal of promoting data harmonization, integration, visualization, and analysis by creating a reproducible framework for combining multiple cancer-related datasets into an interactive decision-support tool. Datasets will include county-level cancer incidence and late-stage incidence measures from State Cancer Profiles (NCI and CDC) for breast, cervical, colorectal, lung, and prostate cancers. Cancer screening estimates will be from CDC PLACES. Measures of access will be derived from the ACR Lung Cancer Screening Registry, National Provider Identifier (NPI/NPPES) data, and FDA-certified mammography facility locations. Additional indicators describing sociodemographic characteristics, transportation access, health insurance coverage, and economic conditions will be from U.S. Census data. The resulting application will allow users to select indicators, assign priorities, and visualize how geographic targeting recommendations change under alternative decision-making frameworks. This functionality will support resource allocation, outreach planning, community engagement, and cancer screening program development. During the Jamboree, the team will develop automated data pipelines using APIs, integrate datasets into a unified geospatial database, design the application's prioritization framework and user interface, and optimize analytical workflows for rapid data processing and visualization. Team members with expertise in APIs, dashboard or mapping applications, R/Python, cancer risk factors, cancer screening, small area estimates, or cancer statistics are welcome to join.	View
56	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #56	Lock NCI Data Jamboree (Project Abstract Submission): Submission #56	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #56	Mon, 07/27/2026 - 16:49	Anonymous	10.208.28.116	Nora	L.	Nock	Ph.D., MS, BS, PE, FSBM	Professor	Case Western Reserve University	Cleveland	nln@case.edu	First Name: Mireya Last Name: Diaz-Insua Post-nominal letters: PhD Affiliation: Case Western Reserve University First Name: Siran Last Name: Koroukian Post-nominal letters: PhD Affiliation: Case Western Reserve University First Name: Johnie Last Name: Rose Post-nominal letters: MD, PhD Affiliation: Case Western Reserve University	Evaluating data quality for reproducibility and AI-readiness	AI, qualitative data, lived experience, interviews, focus groups	Leveraging AI to Augment Human Expertise and Enhance Efficiency in Synthesizing and Analyzing Lived Experience Qualitative Data in Cancer Research	Lived experience obtained from cancer patients, caregivers, families and providers can help integrate cultural context, optimize intervention design, improve clinical outcomes and better understand attitudes, barriers and facilitators to widespread dissemination and implementation efforts. The increased use of qualitative methods in cancer research has created an imminent need to develop more standardized workflows and methods to comprehensively and efficiently synthesize and analyze qualitative data. Leveraging AI to synthesize lived experience qualitative data may help improve efficiency in the arduous and time consuming process of coding and thematic analyses and, potentially uncover new insights not initially observed. We propose to synthesize existing guidebooks and raw transcript text data from interviews and focus groups in cancer patients obtained from multiple sources such as published manuscript supplemental files, National Cancer Institute (NCI)-Designated Comprehensive Cancer Centers patient “story” webpages, and other open access platforms with relevant transcript and guidebook data. We propose to use widely available AI tools (e.g., ChatGPT, Claude) as well as more sophisticated “deep” AI tools (e.g., BERT/BERT-Like) to synthesize transcript data, interview and focus group guides, create codebooks and conduct thematic analyses. We propose to evaluate performance using several measures including but not limited to accuracy and reliability/agreement in revealing codes, subcategories and themes compared to those derived from human experts, reflecting readability, lexical diversity, and coherence. We may also explore utilizing different command prompts from prompt engineering frameworks (e.g., RISEN: Role, Instructions, Steps, End Goal, and Narrowing) if time permits. We hypothesize that human interaction and expert oversight will be needed to add cultural context, judgement, and ethical responsibility. We have assembled a team with expertise in qualitative and mixed methods in cancer research, biostatistics, general AI and large databases but would be interested in having additional member(s), particularly those with advanced/deep AI tool expertise, join our team.	View
57	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #57	Lock NCI Data Jamboree (Project Abstract Submission): Submission #57	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #57	Mon, 07/27/2026 - 16:47	Anonymous	10.208.28.116	Alec		Koppel	MS, PhD	Senior Scientist/Research Professor	Johns Hopkins University Applied Physics Labs / Dept. of Applied Math & Stat.	Laurel, MD	alec.koppel@jhuapl.edu	First Name: Mark Last Name: Yarchoan Post-nominal letters: MD Affiliation: Johns Hopkins University School of Medicine First Name: Mari Last Name: Nakazawa Post-nominal letters: MD Affiliation: Johns Hopkins University School of Medicine	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	instrumental variables, longitudinal studies, nonlinear function approximators, feature construction	Foundations for Causal Modeling of Immunotherapy Outcomes	Immune checkpoint inhibitors can produce durable cancer responses, yet current predictors of benefit remain limited, tumor-centric, and poorly suited to capture the dynamic host–tumor interactions that govern response, resistance, and toxicity. The Johns Hopkins prospective immunotherapy biobank provides a uniquely rich opportunity to address this gap, with longitudinal, multimodal data from more than 400 real-world patients and over 1,100 biospecimens spanning clinical outcomes, ctDNA, CyTOF immune profiling, cytokines, germline genetics, BCR/TCR sequencing, and antibody analyses. However, the complexity, temporal structure, and nonrandom missingness of these data require a new analytic foundation before advanced causal or adaptive modeling can be reliably deployed. We showcase a new analysis pipeline that will transform this heterogeneous biobank into a confounder-aware, temporally validated modeling resource for immunotherapy outcome prediction. The pipeline will integrate biologically informed feature construction with proximal causal inference concepts to capture latent drivers of treatment response, including tumor burden, immune competence, disease severity, treatment selection effects, and assay ascertainment patterns. Rather than treating missing data as a nuisance, the framework will explicitly model missingness—including missing-not-at-random mechanisms arising from progression, toxicity, dropout, or clinical decision-making—as informative structure relevant to patient trajectories. The resulting feature maps and missingness-aware representations will be evaluated through multiple temporal train/test splits designed to mimic prospective clinical deployment, with initial emphasis on survival prediction and related immunotherapy outcomes. This approach is novel in combining longitudinal multimodal immune-oncology data integration, proxy-based confounder mitigation, missingness-aware modeling, and temporally robust validation within a single scalable pipeline. By establishing this foundation, the project will enable reliable downstream development of causal recovery, intervention design informed by models of host–tumor dynamics under immune checkpoint blockade. Ultimately, this work will convert a deeply phenotyped real-world biobank into an actionable computational capability for discovering clinically meaningful predictors of immunotherapy benefit and harm.	View
53	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #53	Lock NCI Data Jamboree (Project Abstract Submission): Submission #53	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #53	Mon, 07/27/2026 - 15:46	Anonymous	10.208.28.116	Maya		Zuhl	MSc	Lead Software Engineer	ICF	Rockville Maryland	maya.zuhl@icf.com	First Name: Ratna Last Name: Thangudu Post-nominal letters: PhD Affiliation: ICF First Name: Alexander Last Name: Pilozzi Post-nominal letters: MSc Affiliation: ICF First Name: Yin Last Name: Lu Post-nominal letters: PhD Affiliation: ICF	Developing, refining, or validating tools, methods, algorithms, and pipelines	Biomedical Multimodal Data Integration, Explainable AI, Agentic AI, Multimodal Reasoning, Translational Cancer Research	AI-Assisted Translational Exploration: Connecting Scientific Publications, Multi-omic Data, and Clinical Trials	Cancer researchers routinely move between publications, genomic and proteomic datasets, imaging repositories, and clinical trial resources to interpret findings and assess translational relevance. Although these resources are increasingly rich and accessible, they remain largely disconnected, requiring researchers to manually synthesize evidence across multiple portals before identifying clinically actionable insights. This fragmentation limits the effective reuse of publicly available cancer data and slows translation of research findings into potential clinical applications. This project proposes an AI-assisted translational exploration workflow that seamlessly connects scientific publications with underlying multimodal datasets and relevant clinical trials. Building on our existing BioInsight platform, the workflow extends publication-centered exploration by integrating ClinicalTrials.gov into an explainable, agentic framework. Starting from a publication or biomarker, the system retrieves associated multi-omic evidence, summarizes key biological findings, identifies relevant biomarkers and pathways, and recommends related clinical trials with evidence-based explanations. Rather than serving as a search interface, the workflow demonstrates transparent AI-assisted reasoning across heterogeneous biomedical resources through citations, supporting evidence, and provenance. The project integrates publicly available scientific publications with cancer datasets from the Cancer Research Data Commons (CRDC), including the Genomic Data Commons (GDC), Proteomic Data Commons (PDC), and Imaging Data Commons (IDC), together with ClinicalTrials.gov. The initial demonstration focuses on publication-centered exploration across genomics, proteomics, imaging, and clinical trials, with an extensible architecture that supports additional modalities. The prototype leverages hybrid Retrieval-Augmented Generation (RAG), agentic AI workflows, biomedical knowledge integration, and cloud-native computing to enable seamless navigation from scientific evidence to clinically relevant studies within a conversational interface. Emphasizing explainable multimodal reasoning over information retrieval alone, the project aligns with the Data Jamboree goals of advancing data discovery, integration, and reuse across cancer data resources. Building on an existing platform and publicly available data, a functional end-to-end prototype can be completed during the three-day Jamboree.	View
54	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #54	Lock NCI Data Jamboree (Project Abstract Submission): Submission #54	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #54	Mon, 07/27/2026 - 15:45	Anonymous	10.208.24.192	Mayanka		Chandra Shekar	Ph.D.	Research Scientist	Oak Ridge National Laboratory	Knoxville	chandrashekm@ornl.gov	First Name: Xi Last Name: Zhang Post-nominal letters: Ph.D. Affiliation: Oak Ridge National Laboratory First Name: Ankita Last Name: Paul Post-nominal letters: Ph.D. Affiliation: Oak Ridge National Laboratory First Name: Ethan Last Name: Seefried Post-nominal letters: Ph.D. Affiliation: Oak Ridge National Laboratory	Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data	whole-slide imaging; foundation-model embeddings;	Hierarchical, Spatially Aware Whole-Slide Image Classification for Pediatric Cancer Using RADIANCE	Pediatric cancers are rare and morphologically heterogeneous, making it difficult to develop robust whole-slide image classification models. RADIANCE is a pathology data project that creates reusable foundation-model embeddings and links them to patch coordinates, slide and specimen identifiers, model provenance, and available clinical metadata. During the 3-days, we will evaluate whether embeddings contain sufficient information for pediatric cancer classification. Using whole-slide images from the CCDI MCI, we will select a cohort with suitable diagnostic labels and train a model to classify cancer or histologic subtypes. We will first establish a baseline that treats patch embeddings as an unordered collection. We will then evaluate a hierarchical approach that incorporates patch locations and relationships among neighboring tissue regions. The primary technical question is whether spatially organized aggregation improves classification compared with conventional non-spatial aggregation. This project is relevant to the broader cancer data community because generating foundation-model embeddings from whole-slide images is computationally expensive. Evaluating reusable embeddings for a concrete downstream task will help determine whether they can support multiple studies without repeatedly processing the source images. We also intend to share the resulting vector database as a resource for the community to use in whole-slide image downstream tasks. The workflow may also provide a template for classification, cohort discovery, case retrieval, and other pathology applications. The work will require expertise in computational pathology, pediatric cancer, machine learning, and data engineering. Expected tools include Python, PyTorch, pathology foundation-model embeddings, the Milvus vector database for embedding storage and retrieval, and GPU-enabled computing on OLCF’s Frontier supercomputing platform for model training and evaluation. We have assembled a ready-to-go team from Oak Ridge National Laboratory. Mayanka ChandraShekar (in-person), Ethan Seefried (in-person), Xi Zhang (online), Ankita Paul (online)	View
52	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #52	Lock NCI Data Jamboree (Project Abstract Submission): Submission #52	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #52	Mon, 07/27/2026 - 15:36	Anonymous	10.208.24.192	Saransh		Singh	M.S in Data Science	Research Data Analyst	University of Illinois Cancer Center	Chicago	sarasing@uic.edu		Evaluating data quality for reproducibility and AI-readiness	AI-readiness; data quality; Cancer Research Data Commons; multimodal oncology data; data leakage	Project Seeker - joining "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets"	I am a confirmed member of the ready-to-go team for "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets" (Lead: Nikita Thakur, UI Cancer Center) and am not seeking assignment to another project. My contribution will be the oncology domain pack: building the oncology.yaml required-variable inventory per AI task, biomarker panel, and ICD-O/mCODE/AJCC staging mappings, then validating that TCGA-BRCA clinical fields actually populate them, the layer where clinical judgment determines whether a dataset is truly usable for a given modeling task. My background aligns directly. As a Data Scientist in Oncology Informatics at the University of Illinois Cancer Center, I own real-world evidence analytics for CancerLinQ/RWD360, including a 31,200-patient Stage I–III lung cancer cohort and a 60-patient ALK+ cohort, where I standardized fragmented biomarker semantics and caught a 225-to-60 patient misclassification before manuscript submission. I also spearheaded our Precision Oncology Dashboard, reconciling Tempus NGS data and cohort counts across disjoint oncology databases. This gives me hands-on fluency in the exact failure modes this project targets, inconsistent staging fields, biomarker misclassification, cross-repository semantic drift and I'm looking forward to building this out with the team.	View
51	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #51	Lock NCI Data Jamboree (Project Abstract Submission): Submission #51	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #51	Mon, 07/27/2026 - 15:35	Anonymous	10.208.24.192	Juhi		Anand	Master of Science Business Analytics (University of Illinois Chicagio - USA), Master of Science Computer Science (University College Dublin - Ireland)	Research Specialist	University Of Illinois Cancer Center	Aurora	janan@uic.edu		Evaluating data quality for reproducibility and AI-readiness	AI-readiness; data quality; Cancer Research Data Commons; multimodal oncology data; data leakage	Project Seeker: joining "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets".	I am a confirmed member of the ready-to-go team for "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets" (Lead: Nikita, University of Illinois Cancer Center) and am not seeking assignment to another project. My primary contribution will be the development and validation of the statistical integrity engine, including batch and site-effect analysis, informative missingness detection, class imbalance assessment, data leakage detection, and robust cross-validation strategies to ensure reproducible AI-readiness evaluation. In addition, I will collaborate with other team members on integrating repository adapters with the common intermediate representation, validating oncology-specific quality metrics, refining the scoring framework, and testing the end-to-end pipeline to ensure all components work cohesively. My background includes a Master of Science in Business Analytics from the University of Illinois Chicago and a Master of Science in Computer Science from University College Dublin. I currently work as a Research Specialist at the University of Illinois Cancer Center, where I develop data-driven solutions for oncology research using Python, machine learning, statistical analysis, SQL, and healthcare data. Through this project, I hope to contribute to an open, reproducible framework for evaluating AI-readiness of multimodal cancer datasets while expanding my expertise in biomedical AI, multimodal data integration, and research software development. I also look forward to working closely with the team across repository integration, clinical data validation, and the final demonstration to help deliver a cohesive and impactful solution.	View
47	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #47	Lock NCI Data Jamboree (Project Abstract Submission): Submission #47	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #47	Mon, 07/27/2026 - 15:30	Anonymous	10.208.28.116	SANDHYA		Prabhakaran	PhD	Research Scientist/ESI	Moffitt Cancer center/Integrated Math Onco department	Tampa/Florida	sandhyaprabhakaran@gmail.com		Project seeker	computational biology, math modeling, machine learning	--	My training is in building computational and mathematical models to better understand 'why' we see the patterns in data. I am interested to learn from others in this cohort on 1) building models to analyze multimodal data, and increase the clinical interpretability; and 2) building the relevant study cohorts to enable (1).	View
46	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #46	Lock NCI Data Jamboree (Project Abstract Submission): Submission #46	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #46	Mon, 07/27/2026 - 15:26	Anonymous	10.208.28.116	Katina	H.	Singh	B.S.	Research Assistant	Memorial Sloan Kettering Cancer Center	New York	singhk8@mskcc.org		Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data		At the Intersection of Immune Monitoring and Statistical Modeling	As a Research Assistant for the Bone Marrow Transplant Service at Memorial Sloan Kettering Cancer Center, I perform novel immunoassays and link high throughput biomarker data to clinical outcomes. These patients are aggregated to form cohorts in institution led initiatives and in multi-center trials for later evaluation of immunotherapies and to determine impact of conditioning regimen on success of hematopoietic stem cell transplants. Having the opportunity to work on projects across different oncology services here influenced my pursuit of an MPH in Epidemiology and Biostatistics, where I was one of three Master's students permitted to take PhD-level coursework in causal inference. My introduction to using observational data analysis to isolate treatment effects in complex care plans this past semester motivated me to investigate ways I could apply these frameworks to our patient cohorts. I plan to bridge my computational and biological expertise through upcoming classes in Bayesian statistics and pathophysiology, allowing me to better isolate the complex factors that modulate cancer treatments. The Jamboree is the ideal venue to meld these methods with my insights gained from working in a premier cancer center, translating institutional experience into solutions for national oncology challenges. Furthermore, this event offers a chance to directly collaborate with leading experts and garner hands-on analysis experience in handling large, diverse datasets. I am particularly excited to utilize the All of Us dataset to identify ways to make personalized healthcare accessible to everyone and examine the role of environmental factors on health for specific communities. Beyond the technical work, I look forward to building lasting connections with other researchers and gaining exclusive mentoring opportunities. It would be amazing to create something what would facilitate greater discoveries in the field and to in some small way, meaningfully contribute to the public effort in the fight against cancer.	View
55	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #55	Lock NCI Data Jamboree (Project Abstract Submission): Submission #55	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #55	Mon, 07/27/2026 - 15:24	Anonymous	10.208.28.116	Giuseppe		Tarantino	Ph.D.	Instructor in Medicine	Dana-Farber Cancer Institute	Boston	giuseppe_tarantino@dfci.harvard.edu	First Name: Tyler Last Name: Aprati Affiliation: Dana-Farber Cancer Institute First Name: Bojan Last Name: Karlas Affiliation: Dana-Farber Cancer Institute First Name: Paulina Last Name: Koehler Affiliation: Dana-Farber Cancer Institute First Name: Hyeon-Tae Last Name: Hwang Affiliation: Dana-Farber Cancer Institute First Name: Cathering Last Name: Feng Affiliation: Dana-Farber Cancer Institute First Name: Xuelu Last Name: Liu Affiliation: Dana-Farber Cancer Institute	Developing, refining, or validating tools, methods, algorithms, and pipelines	computational pathology; whole-slide imaging; unsupervised tile clustering; tumor cell states;	A pipeline and interactive tool for identifying histomorphological tile clusters associated with tumor transcriptional states	Motivation: Transcriptomics has uncovered different biological tumor states that are often associated with disease outcomes and response to therapy; however, transcriptomic sequencing is rarely available in clinical practice. On the other hand, whole-slide images (WSIs) such as H&E are routinely performed, but are not currently used to derive tumor state information. Linking morphological patterns from WSIs with existing transcriptional programs would allow these tumor states to be identified in routine clinical practice, integrating transcriptomic findings into the precision medicine pipeline. We aim to develop a general, end-to-end pipeline and an interactive application that quantifies and tests associations between morphological features and features including any transcriptional signature or patient survival. Approach: For WSIs, the pipeline will embed and group images into morphological clusters to summarize each slide as a vector. For transcriptomic samples, gene expression signature scores are computed directly. We will then develop a predictive model (adjusted for clinical and genomic confounders) associating these morphological features and signature scores from the matched H&E and bulk RNA-seq samples available in TCGA. Downstream, cluster presence will be linked to outcomes such as overall survival, therapy response. The approach is cohort- and tumor-type agnostic, providing a pipeline from histomorphology of clinical samples to associations with tumor states and survival for hypothesis generation or publication. Implementation: The goal is to deliver this application as a simple web app. Users paste a gene list or upload a signature file. The tool computes expression scores and returns per cluster results (survival statistics, visual summaries, etc). Interchangeable configurations (slide type, cluster number, alignment, threshold) are supported, and a precomputed library of signatures is browsable alongside user-defined ones.	View
49	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #49	Lock NCI Data Jamboree (Project Abstract Submission): Submission #49	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #49	Mon, 07/27/2026 - 15:22	Anonymous	10.208.28.116	Nikita		Thakur	Master of Science, Computer Science	Research Data Analyst (Analytics and AI lead)	University of Illinois Cancer Center	Chicago	nthaku3@uic.edu	First Name: Saransh Last Name: Singh Affiliation: University of Illinois Cancer Center First Name: Juhi Last Name: Anand Affiliation: University of Illinois Cancer Center First Name: Lakshmi Sravya Last Name: Rachakonda Affiliation: University of Illinois Cancer Center	Evaluating data quality for reproducibility and AI-readiness	AI-readiness; data quality; Cancer Research Data Commons; multimodal oncology data; data leakage	AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets	Rationale: NIH Bridge2AI criteria establish that FAIR conformance alone does not make a dataset AI-ready, yet the criterion most likely to invalidate a downstream model, Characterization, remains largely unautomated. Researchers assembling oncology AI cohorts evaluate batch effects, informative missingness, leakage risk, and clinical-variable completeness ad hoc and per repository, because CRDC Commons were developed independently with distinct data models. Objective: Prototype an installable Python profiler answering one question: Does this oncology dataset support this AI task, and if not, what is missing? Approach: Repository adapters reduce heterogeneous inputs to a shared representation of feature matrix, sample metadata, and resolved site label. Two check families run against it. Modality-agnostic integrity checks cover missingness and its correlation with site, class imbalance, batch, and site effects via cross-validated site-predictability under a permutation null, and leakage from patient-straddle or identifier columns. Oncology completeness checks cover staging, histology, treatment, outcomes, actionable biomarkers, and ICD-O/mCODE conformance. Domain and task knowledge live in declarative YAML packs rather than code, keeping the engine small and extensible. Output is an evidence-linked data card plus Croissant metadata. Datasets: GDC TCGA-BRCA (clinical, expression, mutations) and one IDC imaging collection; PDC CPTAC proteomics and SEER as stretch targets. Open access tiers throughout. Jamboree deliverables: A public repository containing the intermediate representation and adapter interface, working batch-effect and leakage checks validated on TCGA-BRCA, a first oncology completeness pack, and a per-patient multimodal linkage matrix spanning GDC and IDC, with a rendered data card. Tools and environment: Python (pandas, scikit-learn, pydicom), open-tier repository APIs, and modest compute: the design is metadata-first and requires no pixel-level processing. Team: Assembled and ready (see additional authors), four members spanning computer science and healthcare data analytics, with roles mapped to architecture layers: representation, integrity checks, oncology pack, and adapters.	View
50	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #50	Lock NCI Data Jamboree (Project Abstract Submission): Submission #50	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #50	Mon, 07/27/2026 - 15:16	Anonymous	10.208.28.116	Lakshmi Sravya		Rachakonda	Masters of Science in Computer Science		University of Illinois - Cancer Center	Chicago	lrach@uic.edu		Evaluating data quality for reproducibility and AI-readiness	AI-readiness; data quality; Cancer Research Data Commons; multimodal oncology data; data leakage	Project Seeker : joining "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets".	I am a confirmed member of the ready-to-go team for "AI-Readiness Scorecard: A Task-Relative Profiler for Multimodal Cancer Research Data Commons Datasets" (Lead: Nikita Thakur, UI Cancer Center) and am not seeking assignment to another project. My contribution will be the data ingestion layer and the linkage matrix. I will connect the profiler to two Cancer Research Data Commons repositories, GDC and IDC, using Python and pydicom, pull the clinical and imaging metadata each one provides, and build the per-patient table showing which types of data exist for which patients across both. Everything the rest of the team builds runs on top of this, so my priority is getting it working early and keeping it reliable. Once it is stable, I will assist the other members with their pieces as needed, particularly the integrity checks, since I have experience building machine learning and AI models on healthcare data and understand what those checks are guarding against. My background is in clinical data engineering at the University of Illinois Cancer Center, where I work as a data analyst on the Data Integration and Statistical Reporting core. I build cancer patient cohorts from our clinical data warehouse and tumor registry, which means combining data from several systems that each store patient information differently. Getting those sources to line up correctly, and verifying that they have, is the central part of my work - and it is the same task this project needs at a national scale. What I hope to get out of the Jamboree is working adapters for both repositories and a finished coverage matrix by the end of the event, plus a better understanding of how the national commons are structured, which I expect to bring back to the cohort work I do at the Cancer Center.	View
48	Star/flag NCI Data Jamboree (Project Abstract Submission): Submission #48	Lock NCI Data Jamboree (Project Abstract Submission): Submission #48	Add notes to NCI Data Jamboree (Project Abstract Submission): Submission #48	Mon, 07/27/2026 - 15:11	Anonymous	10.208.28.116	Anthony		Cristillo	Ph.D., MBA	Senior Vice President and Federal Health Lead	Revolutional, LLC	McLean	anthony.cristillo@revolutional.com	First Name: Diana Last Name: Castiblanco Post-nominal letters: Ph.D., FMACP, FNC Affiliation: Revolutional, LLC First Name: Rod Last Name: Fontecilla Post-nominal letters: Ph.D. Affiliation: Revolutional, LLC First Name: Naga Last Name: Nandivelugu Post-nominal letters: B.E., MBA Affiliation: Revolutional, LLC	Enhancing data interoperability (e.g., data harmonization, data federation)	Data interoperability, Data harmonization, AI readiness, Business rule extraction, Semantic data model	AI-Assisted Business Rule Extraction To Improve Interoperability across Oncology Datasets, and Enable Accurate, Reproducible, and Trustworthy Analyses	Background: Interpretation of oncology treatment, response, and safety data, including line of therapy, adverse events, treatment response, disease progression, censoring, and evidence status, depends on business rules embedded within study protocols, data dictionaries, case report forms, and statistical analysis plans rather than explicit, computable data elements. These implicit rules limit interoperability across datasets and force AI systems to infer clinical meaning that was never formally represented, reducing reproducibility and trustworthiness. The value of explicit rule representation has precedent in oncology; for example, iRECIST formalized immune-related response criteria beyond RECIST, improving consistency in clinical trial interpretation. Objective: Evaluate whether AI-assisted extraction and explicit representation of oncology business rules improve cross-source interpretability, mapping accuracy, and reproducibility of downstream analyses. Methods: Using datasets from the NCI Clinical and Translational Data Commons (CTDC), we will leverage our FedRAMP/NIST800-53 AI-powered provisional patented platform (RISE) to identify business rules from heterogeneous clinical and research artifacts prior to data ingestion and integration. Extracted rules will be mapped to a lightweight canonical model consisting of nine core entities (e.g., Patient, Diagnosis, Treatment Episode, Exposure, Adverse Event, Response Assessment, Disease Progression, Outcome, and Evidence Source) and nine categories of governing business rules. Each rule will retain provenance, source version, extraction confidence, and expert review status. Cancer domain experts will validate the extracted rules to ensure factual grounding and clinical accuracy. Expected Outcomes: This project does not seek to replace existing oncology standards or create a universal cancer data model. Instead, it evaluates whether explicit, computable representation of business rules provides a semantic layer that complements existing standards, improves interoperability across oncology datasets, and enables more accurate, reproducible, and trustworthy AI-enabled analyses. These findings will inform scalable approaches for semantic harmonization within the cancer research ecosystem.	View