Extracting cancer concepts from clinical notes using natural language processing: a systematic review. 2023

Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran.

BACKGROUND Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. METHODS PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning "Cancer", "NLP", "Coding", and "Registries" until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. RESULTS Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). CONCLUSIONS The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.

UI MeSH Term Description Entries
D009323 Natural Language Processing Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage. Language Processing, Natural,Language Processings, Natural,Natural Language Processings,Processing, Natural Language,Processings, Natural Language
D001943 Breast Neoplasms Tumors or cancer of the human BREAST. Breast Cancer,Breast Tumors,Cancer of Breast,Breast Carcinoma,Cancer of the Breast,Human Mammary Carcinoma,Malignant Neoplasm of Breast,Malignant Tumor of Breast,Mammary Cancer,Mammary Carcinoma, Human,Mammary Neoplasm, Human,Mammary Neoplasms, Human,Neoplasms, Breast,Tumors, Breast,Breast Carcinomas,Breast Malignant Neoplasm,Breast Malignant Neoplasms,Breast Malignant Tumor,Breast Malignant Tumors,Breast Neoplasm,Breast Tumor,Cancer, Breast,Cancer, Mammary,Cancers, Mammary,Carcinoma, Breast,Carcinoma, Human Mammary,Carcinomas, Breast,Carcinomas, Human Mammary,Human Mammary Carcinomas,Human Mammary Neoplasm,Human Mammary Neoplasms,Mammary Cancers,Mammary Carcinomas, Human,Neoplasm, Breast,Neoplasm, Human Mammary,Neoplasms, Human Mammary,Tumor, Breast
D005260 Female Females
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D017432 Unified Medical Language System A research and development program initiated by the NATIONAL LIBRARY OF MEDICINE to build knowledge sources for the purpose of aiding the development of systems that help health professionals retrieve and integrate biomedical information. The knowledge sources can be used to link disparate information systems to overcome retrieval problems caused by differences in terminology and the scattering of relevant information across many databases. The three knowledge sources are the Metathesaurus, the Semantic Network, and the Specialist Lexicon. Metathesaurus,UMLS

Related Publications

Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
January 2017, Studies in health technology and informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
February 2023, Journal of biomedical informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
December 2021, Studies in health technology and informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
November 2021, Journal of the American Medical Informatics Association : JAMIA,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
April 2019, JMIR medical informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
March 2024, Pancreatology : official journal of the International Association of Pancreatology (IAP) ... [et al.],
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
November 2017, Cancer research,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
April 2014, Journal of biomedical informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
January 2024, Studies in health technology and informatics,
Maryam Gholipour, and Reza Khajouei, and Parastoo Amiri, and Sadrieh Hajesmaeel Gohari, and Leila Ahmadian
April 2022, Journal of the American Medical Informatics Association : JAMIA,
Copied contents to your clipboard!