The dataset consists of two parts:
- a base data set. The base data set contains a total of 4000 pedestrian- and 5000 non-pedestrian samples cut out from video images and scaled to common size of 18x36 pixels.
- additional non-pedestrian images. An additional collection of 1200 video images NOT containing any pedestrians, intended for the extraction of additional negative training examples.
Pedestrian images were obtained from manually labeling and extracting the rectangular positions of pedestrians in video images. Video images were recorded at various (day) times and locations with no particular constraints on pedestrian pose or clothing, except that pedestrians are standing in upright position and are fully visible. As non-pedestrian images, patterns representative for typical preprocessing steps within a pedestrian classification application, from video images known not to contain any pedestrians. We chose to use a shape-based pedestrian detector that matches a given set of pedestrian shape templates to distance transformed edge images (i.e. comparatively relaxed matching threshold).