Autonomous Driving
License: MIT


We construct a large-scale stereo dataset named DrivingStereo. It contains over 180k images covering a diverse set of driving scenarios, which is hundreds of times larger than the KITTI stereo dataset. High-quality labels of disparity are produced by a model-guided filtering strategy from multi-frame LiDAR points. Compared with other dataset, the deep-learning models trained on our DrivingStereo achieve higher generalization accuracy in real-world driving scenes. The details of our dataset are described in our paper.


Data Collection

The total number of our dataset is 182188, where the training set has 174437 pairs and the testing set has 7751 pairs. For convenience, we compress the images by sequences. Different from the original resolution reported in the paper, all of the images and maps are downsampled to the half resolution. The average size of image is 881x400. In addition to the sequential training data, we also select 2000 frames with 4 different weathers (sunny, cloudy, foggy, rainy) for specific requests.
As KITTI stereo dataset, both of the disparity maps and depth maps are saved as uint16 PNG images. The disparity value and depth value for each pixel can be computed by converting the uint16 value to float and dividing it by 256. The zero values indicate the invalid pixels.


  • Training data
    • contained Left images, Right Images, Disparity Maps, Depth Maps
    • The training dataset contains 38 sequences and 174431 frames. Each sequence is compress into an individual zip file. The attributes of each sequence are given in our supplementary materials.
  • Different weathers
    • contained Sunny, Cloudy, Foggy, Rainy
    • For specific requirements (image dehaze, image derain, or image restoration), 2000 frames with different weathers (sunny, cloudy, foggy, and rainy) from sequences are selected, where each class of weather contains 500 frames. In addition to half-resolution data, we also release full-resolution data for these frames. Different from half-resulution disparity maps, the disparity value for each pixel in the full-resolution map is computed by converting the uint16 value to float and dividing it by 128.
  • Testing data
    • contained Left images, Right Images, Disparity Maps, Depth Maps
    • The testing dataset contains 4 sequences and 7751 frames. In addition to the half-resolution data, the full-resolution data is also provided for evaluation.
  • Calibration Parameters
    • contained Train, Test
    • The format of calibration files is similar to KITTI. We provide the calibration paramters for both half-resolution and full-resolution images.


Please use the following citation when referencing the dataset:

title={DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios},
author={Yang, Guorun and Song, Xiao and Huang, Chaoqin and Deng, Zhidong and Shi, Jianping
and Zhou, Bolei},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},



Data Summary
Provided by
Guorun Yang
Guorun Yang is a Ph.D. Student of Tsinghua University
Start Building AI Now