Enhancing thoracic disease detection using chest X-rays from PubMed Central Open Access. 2023

Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.

Large chest X-rays (CXR) datasets have been collected to train deep learning models to detect thorax pathology on CXR. However, most CXR datasets are from single-center studies and the collected pathologies are often imbalanced. The aim of this study was to automatically construct a public, weakly-labeled CXR database from articles in PubMed Central Open Access (PMC-OA) and to assess model performance on CXR pathology classification by using this database as additional training data. Our framework includes text extraction, CXR pathology verification, subfigure separation, and image modality classification. We have extensively validated the utility of the automatically generated image database on thoracic disease detection tasks, including Hernia, Lung Lesion, Pneumonia, and pneumothorax. We pick these diseases due to their historically poor performance in existing datasets: the NIH-CXR dataset (112,120 CXR) and the MIMIC-CXR dataset (243,324 CXR). We find that classifiers fine-tuned with additional PMC-CXR extracted by the proposed framework consistently and significantly achieved better performance than those without (e.g., Hernia: 0.9335 vs 0.9154; Lung Lesion: 0.7394 vs. 0.7207; Pneumonia: 0.7074 vs. 0.6709; Pneumothorax 0.8185 vs. 0.7517, all in AUC with p< 0.0001) for CXR pathology detection. In contrast to previous approaches that manually submit the medical images to the repository, our framework can automatically collect figures and their accompanied figure legends. Compared to previous studies, the proposed framework improved subfigure segmentation and incorporates our advanced self-developed NLP technique for CXR pathology verification. We hope it complements existing resources and improves our ability to make biomedical image data findable, accessible, interoperable, and reusable.

UI MeSH Term Description Entries
D011014 Pneumonia Infection of the lung often accompanied by inflammation. Experimental Lung Inflammation,Lobar Pneumonia,Lung Inflammation,Pneumonia, Lobar,Pneumonitis,Pulmonary Inflammation,Experimental Lung Inflammations,Inflammation, Experimental Lung,Inflammation, Lung,Inflammation, Pulmonary,Inflammations, Lung,Inflammations, Pulmonary,Lobar Pneumonias,Lung Inflammation, Experimental,Lung Inflammations,Lung Inflammations, Experimental,Pneumonias,Pneumonias, Lobar,Pneumonitides,Pulmonary Inflammations
D011030 Pneumothorax An accumulation of air or gas in the PLEURAL CAVITY, which may occur spontaneously or as a result of trauma or a pathological process. The gas may also be introduced deliberately during PNEUMOTHORAX, ARTIFICIAL. Pneumothorax, Primary Spontaneous,Pressure Pneumothorax,Primary Spontaneous Pneumothorax,Spontaneous Pneumothorax,Tension Pneumothorax,Pneumothorax, Pressure,Pneumothorax, Spontaneous,Pneumothorax, Tension,Spontaneous Pneumothorax, Primary
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D013896 Thoracic Diseases Disorders affecting the organs of the thorax. Disease, Thoracic,Diseases, Thoracic,Thoracic Disease
D013902 Radiography, Thoracic X-ray visualization of the chest and organs of the thoracic cavity. It is not restricted to visualization of the lungs. Thoracic Radiography,Radiographies, Thoracic,Thoracic Radiographies
D014965 X-Rays Penetrating electromagnetic radiation emitted when the inner orbital electrons of an atom are excited and release radiant energy. X-ray wavelengths range from 1 pm to 10 nm. Hard X-rays are the higher energy, shorter wavelength X-rays. Soft x-rays or Grenz rays are less energetic and longer in wavelength. The short wavelength end of the X-ray spectrum overlaps the GAMMA RAYS wavelength range. The distinction between gamma rays and X-rays is based on their radiation source. Grenz Ray,Grenz Rays,Roentgen Ray,Roentgen Rays,X Ray,X-Ray,Xray,Radiation, X,X-Radiation,Xrays,Ray, Grenz,Ray, Roentgen,Ray, X,Rays, Grenz,Rays, Roentgen,Rays, X,X Radiation,X Rays,X-Radiations
D022126 Access to Information Individual's rights to obtain and use information collected or generated by others. FOIA Requests,Freedom of Information Act Requests,Open Access to Information,Public Access to Information,FOIA Request,Information, Access to,Request, FOIA,Requests, FOIA

Related Publications

Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
August 2021, Journal of medical imaging and radiation oncology,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
April 2023, Scientific data,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
September 2006, Journal of the Royal Society of Medicine,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
January 2024, Scientific reports,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
January 2022, Multimedia tools and applications,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
June 2020, The Journal of surgical research,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
February 2024, The Laryngoscope,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
October 2022, Neural processing letters,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
October 1968, Rocky Mountain medical journal,
Mingquan Lin, and Bojian Hou, and Swati Mishra, and Tianyuan Yao, and Yuankai Huo, and Qian Yang, and Fei Wang, and George Shih, and Yifan Peng
January 2020, Procedia computer science,
Copied contents to your clipboard!