NCI Data Jamboree (Project Abstract Submission): Submission #2
Submission information
Submission Number: 2
Submission ID: 183015
Submission UUID: 83b183a5-8018-47ee-9eb4-fb7c78f571cb
Submission URI: /nci/datajamboree/abstractsubmission
Submission Update: /nci/datajamboree/abstractsubmission?token=i4p8qaLnDOOf53RZqquzZMni3hiSLvyq04N8Qvtv-po
Created: Mon, 06/08/2026 - 20:30
Completed: Mon, 06/08/2026 - 20:30
Changed: Mon, 06/08/2026 - 20:30
Remote IP address: 10.208.24.28
Submitted by: Anonymous
Language: English
Is draft: No
Webform: NCI Data Jamboree (Abstracts)
Submitted to: NCI Data Jamboree (Project Abstract Submission)
| First Name | Megha |
|---|---|
| Middle Initial | B. |
| Last Name | Srivastava |
| Degree(s) | B.S./M.S. in Computer Science |
| Position/Title/Career Status | PhD Student in Computer Science |
| Organization | Stanford University |
| Organization Address | Stanford |
| megha@cs.stanford.edu | |
| List of Additional Authors | |
| Abstract Category | Employing statistical, computational, and informatics tools, algorithms, and methods to integrate or analyze data |
| Abstract Keywords | machine learning, AI-readiness, distribution shift, causal inference, confounding variables, language models, LLMs |
| Abstract Title | Project Seeker |
| Abstract | I am a PhD student in Computer Science, with significant experience in machine learning, large language modeling, and human-AI interaction. I have recently been transition my research towards applications of AI in medicine, healthcare, and drug discovery, and hope to understand what challenges exist on the dataset-level, and what are ideal datasets that can help push different problems forward. One research area I am particularly interested in is challenges of distribution shift -- e.g. mismatch between the training dataset and test time inference, and how to tackle that. I am particularly curious about methods for identifying potential confounding variables that are unmeasured in the current dataset. My hope is to join a project that can help improve the quality and availability of oncology datasets for machine learning research. |