License: CC BY-NC-SA 4.0


Project Summary: To build a public open dataset of chest X-ray and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias (MERS, SARS, and ARDS.). Data will be collected from public sources as well as through indirect collection from hospitals and physicians. All images and data will be released publicly in this GitHub repo.

Data Annotation

Lung Bounding Boxes (license: CC BY-SA) contributed by Andrew Gough at General Blockchain, Inc.

Pneumonia severity scores for 94 images (license: CC BY-SA) from the paper Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning

Generated Lung Segmentations (license: CC BY-SA) from the paper Lung Segmentation from Chest X-rays using Variational Data Imputation

Brixia score for 192 images (license: CC BY-NC-SA) from the paper End-to-end learning for semiquantitative rating of COVID-19 severity on Chest X-rays

Lung and other segmentations for 517 images (license: CC BY) in COCO and raster formats by v7labs


Please use the following citation when referencing the dataset:

  title={COVID-19 Image Data Collection: Prospective Predictions Are the Future},
  author={Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong
and Marzyeh Ghassemi},
  journal={arXiv 2006.11988},



Data Summary
Provided by
Joseph Paul Cohen.
Joseph is the director of the Machine Learning and Medicine Lab, a group dedicated to improving healthcare with machine learning. Joseph has created; which lets researchers publish and read summaries of research papers like an online journal club, as well as Academic Torrents; a system designed to move large datasets and become the library of the future. He is also the creator of BlindTool; a mobile application providing a sense of vision to the blind by using an artificial neural network that speaks names of objects as they are identified. Joseph is the creator of Blucat; a cross-platform Bluetooth debugging tool.
Start Building AI Now