Annual Meeting of the NCI Cohort Consortium (Abstract Submission): Submission #13

Submission information
Submission Number: 13
Submission ID: 127583
Submission UUID: 370c5d24-f379-4b85-9d20-69afbb707912

Created: Fri, 09/13/2024 - 16:50
Completed: Fri, 09/13/2024 - 16:55
Changed: Mon, 09/16/2024 - 16:40

Remote IP address: 10.208.28.69
Submitted by: Anonymous
Language: English

Is draft: No





Lightning Talks Abstract
------------------------






Presenter's First Name: : Martin









Presenter's Last Name:: Lajous









Title (eg: professor, assistant professor, chair, etc):: Faculty-Researcher









Degree(s): MD, ScD









Contact Email:: mlajous@insp.mx









Organization:: Instituto Nacional de Salud Publica









Project Title:: An Efficient Pipeline-Based Geocoding Approach to Handle Self-Reported Addresses in a Large Population-based Cancer Cohort in Mexico









Additional Authors:
1. First Name: Alejandro
   Last Name: Molina-Villegas 
   Degree(s): PhD
   Organization: CONAHCyT-CentroGeo
2. First Name: Karla
   Last Name: Valdez-Trejo
   Degree(s): MS
   Organization: Instituto Nacional de Salud Publica
3. First Name: Pablo
   Last Name: Lopez-Ramires
   Degree(s): PhD
   Organization: CentroGeo
4. First Name: Alberto
   Last Name: Simpser
   Degree(s): PhD
   Organization: ITAM
5. First Name: Adrian
   Last Name: Cortes-Valencia
   Degree(s): MS
   Organization: Instituto Nacional de Salud Publica
6. First Name: Dalia
   Last Name: Stern
   Degree(s): PhD
   Organization: CONAHCyT-Instituto Nacional de Salud Publica
7. First Name: Karla
   Last Name: Cervantes-Martinez
   Degree(s): PhD
   Organization: Instituto Nacional de Salud Publica
8. First Name: Liliana
   Last Name: Gomez-Flores-Ramos
   Degree(s): PhD
   Organization: Instituto Nacional de Salud Publica










Abstract::
Background. Geocoding participants’ addresses in epidemiologic cohorts is now highly accurate in high-income countries. Non-standardized address notation, lack of address registries, and limitations on geocoding resources are important challenges for geocoding in limited resource settings. We aimed to develop an efficient pipeline-based geocoding approach to handle self-reported addresses from participants in a cancer cohort in Mexico, assess the validity of coordinate assignment, and maximize geocoding success.

Methods. We obtained self-reported addresses at baseline in 2006-2008 from 104,003 participants in the Mexican Teachers’ Cohort (n=115,275). After cleaning and standardization, we optimized processing times by splitting the data (651,668 candidate coordinates) and creating 105 Amazon AWS virtual machines to submit queries asynchronously to the ArcGIS REST API. We conducted geospatial verification by projecting candidate coordinates through spatial join operation on Mexico’s official neighborhood vector shapefile. We compared similarities between the self-reported and API-derived addresses using string alignment scoring metrics. To assess accuracy of the procedure we compared address coordinates to residential block-centroid coordinates available in the 2006 national voting registry database.

Results. After discarding non-valid coordinates and conducting geospatial verification and similarity scoring, we assigned coordinates to 101,704 study participants. When we compared assigned coordinates to voting registry block-centroid coordinates for 81,270 participants, the median distance between coordinates was 0.17 km (inter quartile range, 0.06-0.77). We maximized geocoding to 111,299 (97%) study participants by assigning voting registry-defined coordinates to 9,595 participants without a valid address. 

Conclusions. Address-level geocoding based on self-reported addresses can be efficiently achieved in large-scale epidemiological studies in Mexico.