Overview
The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses.
Data Collection
We used Amazon's Mechanical Turk to harvest and label the images from Google Street View. To build the data set, we created several Human Intelligence Tasks (HITs) to be completed on Mechanical Turk.
Workers are assigned a unique city and are requested to acquire 20 images that contain text from Google Street view. They were instructed to: (1) perform a Search Nearby:* on their city, (2) examine the businesses in the search results, and (3) look at the associated street view for images containing text from the business name. If words are found, they compose the scene to minimize skew, save a screen shot, and record the business name and address.
Data Annotation
Workers are presented with an image and a list of candidate words to label with bounding boxes. This contrasts with the ICDAR Robust Reading data set in that we only label words associated with businesses. We used Alex Sorokin's Annotation Toolkit to support bounding box image annotation. For each image, we obtained a list of local business names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20 business results for each image, typically resulting in 50 unique words. To summarize, the SVT data set consists of images collected from Google Street View, where each image is annotated with bounding boxes around words from businesses around where the image was taken.
Citation
@inproceedings{wang2011end,
title={End-to-end scene text recognition},
author={Wang, Kai and Babenko, Boris and Belongie, Serge},
booktitle={2011 International Conference on Computer Vision},
pages={1457--1464},
year={2011},
organization={IEEE}
}
@inproceedings{wang2010word,
title={Word spotting in the wild},
author={Wang, Kai and Belongie, Serge},
booktitle={European Conference on Computer Vision},
pages={591--604},
year={2010},
organization={Springer}
}