Knowledge discovery and data mining to assist natural language understanding. 1998

A Wilcox, and G Hripcsak
Department of Medical Informatics, Columbia University, New York, NY, USA.

As natural language processing systems become more frequent in clinical use, methods for interpreting the output of these programs become increasingly important. These methods require the effort of a domain expert, who must build specific queries and rules for interpreting the processor output. Knowledge discovery and data mining tools can be used instead of a domain expert to automatically generate these queries and rules. C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system. A general-purpose natural language processor using this rule base was tested on a set of 200 chest radiograph reports. When a small set of reports, classified by physicians, was used as the training set, the generated rule base performed as well as lay persons, but worse than physicians. When a larger set of reports, using ICD9 coding to classify the set, was used for training the system, the rule base performed worse than the physicians and lay persons. It appears that a larger, more accurate training set is needed to increase performance of the method.

UI MeSH Term Description Entries
D008171 Lung Diseases Pathological processes involving any part of the LUNG. Pulmonary Diseases,Disease, Pulmonary,Diseases, Pulmonary,Pulmonary Disease,Disease, Lung,Diseases, Lung,Lung Disease
D009323 Natural Language Processing Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage. Language Processing, Natural,Language Processings, Natural,Natural Language Processings,Processing, Natural Language,Processings, Natural Language
D003663 Decision Trees A graphic device used in decision analysis, series of decision options are represented as branches (hierarchical). Decision Tree,Tree, Decision,Trees, Decision
D006331 Heart Diseases Pathological conditions involving the HEART including its structural and functional abnormalities. Cardiac Disorders,Heart Disorders,Cardiac Diseases,Cardiac Disease,Cardiac Disorder,Heart Disease,Heart Disorder
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D012680 Sensitivity and Specificity Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed) Specificity,Sensitivity,Specificity and Sensitivity
D013902 Radiography, Thoracic X-ray visualization of the chest and organs of the thoracic cavity. It is not restricted to visualization of the lungs. Thoracic Radiography,Radiographies, Thoracic,Thoracic Radiographies
D016247 Information Storage and Retrieval Organized activities related to the storage, location, search, and retrieval of information. Information Retrieval,Data Files,Data Linkage,Data Retrieval,Data Storage,Data Storage and Retrieval,Information Extraction,Information Storage,Machine-Readable Data Files,Data File,Data File, Machine-Readable,Data Files, Machine-Readable,Extraction, Information,Files, Machine-Readable Data,Information Extractions,Machine Readable Data Files,Machine-Readable Data File,Retrieval, Data,Storage, Data

Related Publications

A Wilcox, and G Hripcsak
January 2000, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society,
A Wilcox, and G Hripcsak
August 2000, Statistical methods in medical research,
A Wilcox, and G Hripcsak
May 1998, Archives of pathology & laboratory medicine,
A Wilcox, and G Hripcsak
June 2009, Analytical and quantitative cytology and histology,
A Wilcox, and G Hripcsak
January 2000, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing,
A Wilcox, and G Hripcsak
January 2004, SAR and QSAR in environmental research,
A Wilcox, and G Hripcsak
January 2001, Studies in health technology and informatics,
A Wilcox, and G Hripcsak
September 2008, Journal of empirical research on human research ethics : JERHRE,
A Wilcox, and G Hripcsak
February 2023, Social science research,
A Wilcox, and G Hripcsak
September 2016, The Journal of nursing administration,
Copied contents to your clipboard!