Annual Meeting of the NCI Cohort Consortium (Abstract Submission): Submission #13
Submission information
Submission Number: 13
Submission ID: 127583
Submission UUID: 370c5d24-f379-4b85-9d20-69afbb707912
Submission URI: /egrp/cohortconsortium/abstracts
Submission Update: /egrp/cohortconsortium/abstracts?token=fnXJaRkyO0SdnFVAO0pBi8KwW1rBW8dI0c5gELVkaAg
Created: Fri, 09/13/2024 - 16:50
Completed: Fri, 09/13/2024 - 16:55
Changed: Mon, 09/16/2024 - 16:40
Remote IP address: 10.208.28.69
Submitted by: Anonymous
Language: English
Is draft: No
Webform: Cohort 2024 (Abstracts Submission)
Lightning Talks Abstract
Martin
Lajous
Faculty-Researcher
MD, ScD
Instituto Nacional de Salud Publica
An Efficient Pipeline-Based Geocoding Approach to Handle Self-Reported Addresses in a Large Population-based Cancer Cohort in Mexico
Background. Geocoding participants’ addresses in epidemiologic cohorts is now highly accurate in high-income countries. Non-standardized address notation, lack of address registries, and limitations on geocoding resources are important challenges for geocoding in limited resource settings. We aimed to develop an efficient pipeline-based geocoding approach to handle self-reported addresses from participants in a cancer cohort in Mexico, assess the validity of coordinate assignment, and maximize geocoding success.
Methods. We obtained self-reported addresses at baseline in 2006-2008 from 104,003 participants in the Mexican Teachers’ Cohort (n=115,275). After cleaning and standardization, we optimized processing times by splitting the data (651,668 candidate coordinates) and creating 105 Amazon AWS virtual machines to submit queries asynchronously to the ArcGIS REST API. We conducted geospatial verification by projecting candidate coordinates through spatial join operation on Mexico’s official neighborhood vector shapefile. We compared similarities between the self-reported and API-derived addresses using string alignment scoring metrics. To assess accuracy of the procedure we compared address coordinates to residential block-centroid coordinates available in the 2006 national voting registry database.
Results. After discarding non-valid coordinates and conducting geospatial verification and similarity scoring, we assigned coordinates to 101,704 study participants. When we compared assigned coordinates to voting registry block-centroid coordinates for 81,270 participants, the median distance between coordinates was 0.17 km (inter quartile range, 0.06-0.77). We maximized geocoding to 111,299 (97%) study participants by assigning voting registry-defined coordinates to 9,595 participants without a valid address.
Conclusions. Address-level geocoding based on self-reported addresses can be efficiently achieved in large-scale epidemiological studies in Mexico.
Methods. We obtained self-reported addresses at baseline in 2006-2008 from 104,003 participants in the Mexican Teachers’ Cohort (n=115,275). After cleaning and standardization, we optimized processing times by splitting the data (651,668 candidate coordinates) and creating 105 Amazon AWS virtual machines to submit queries asynchronously to the ArcGIS REST API. We conducted geospatial verification by projecting candidate coordinates through spatial join operation on Mexico’s official neighborhood vector shapefile. We compared similarities between the self-reported and API-derived addresses using string alignment scoring metrics. To assess accuracy of the procedure we compared address coordinates to residential block-centroid coordinates available in the 2006 national voting registry database.
Results. After discarding non-valid coordinates and conducting geospatial verification and similarity scoring, we assigned coordinates to 101,704 study participants. When we compared assigned coordinates to voting registry block-centroid coordinates for 81,270 participants, the median distance between coordinates was 0.17 km (inter quartile range, 0.06-0.77). We maximized geocoding to 111,299 (97%) study participants by assigning voting registry-defined coordinates to 9,595 participants without a valid address.
Conclusions. Address-level geocoding based on self-reported addresses can be efficiently achieved in large-scale epidemiological studies in Mexico.