Development and Validation of a Claims-Based Algorithm for Identifying Incident Colorectal Cancer and Determining Progression Phases. 2025
OBJECTIVE Health insurance claims comprising diagnosis and treatment information offer insights into clinical practice and medical care costs. However, inaccurate diagnosis codes listed in claims and the absence of staging information limit the understanding of colorectal cancer (CRC)-related clinical practice. We developed and validated an algorithm to accurately identify incident CRC cases and their progression phases using claims data. METHODS We conducted a retrospective study using claims data from three Japanese institutions, including two designated cancer care hospitals (DCCHs), between April 2016 and August 2022. An algorithm that uses CRC-associated diagnostic codes and claim codes for CRC-specific treatments was developed to identify incident CRC cases and classify patients into three progression phases (treatment-sequenced groups: endoscopic, surgical, and noncurative). The algorithm was refined using cohorts from two DCCHs in April-September 2017 and April-September 2019 to enhance performance metrics, with validity tested at these hospitals during different periods and at another hospital. The performance metrics of the algorithm included positive predictive value (PPV), sensitivity in identifying incident CRC, and accuracy in determining progression phases. RESULTS The performance metrics of the algorithm were enhanced by filtering prevalent cases, selecting CRC-specific treatments, and targeting invasive CRC cases. The algorithm for identifying incident invasive CRC achieved high PPVs (91.2% [95% CI, 89.5 to 92.7] and 94.4% [95% CI, 87.6 to 97.6]), sensitivities (94.6% [95% CI, 93.1 to 95.7] and 100.0% [95% CI, 95.7 to 100.0]), and progression phase accuracies (91.5% [95% CI, 89.7 to 93.0] and 97.6% [95% CI, 91.8 to 99.4]) in two validation cohorts. CONCLUSIONS The developed algorithm accurately identified incident invasive CRC cases and determined their progression phases using claims data. Application of this algorithm could contribute to research on real-world practices and medical care costs associated with CRC.
| UI | MeSH Term | Description | Entries |
|---|