graviti
Products
Resources
About us
The Street View Text
2D Box
OCR/Text Detection
|...
License: Unknown

Overview

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this
data exhibits high variability and often has low resolution. In dealing with outdoor street
level imagery, we note two characteristics. (1) Image text often comes from business signage
and (2) business names are easily available through geographic business searches. These factors
make the SVT set uniquely suited for word spotting in the wild: given a street view image,
the goal is to identify words from nearby businesses.

Data Collection

We used Amazon's Mechanical Turk to harvest and label the images from Google Street View.
To build the data set, we created several Human Intelligence Tasks (HITs) to be completed on
Mechanical Turk.

Workers are assigned a unique city and are requested to acquire 20 images
that contain text from Google Street view. They were instructed to: (1) perform a Search Nearby:*
on their city, (2) examine the businesses in the search results, and (3) look at the associated
street view for images containing text from the business name. If words are found, they compose
the scene to minimize skew, save a screen shot, and record the business name and address.

Data Annotation

Workers are presented with an image and a list of candidate words to label with bounding boxes.
This contrasts with the ICDAR Robust Reading data set in that we only label words associated
with businesses. We used Alex Sorokin's Annotation Toolkit
to support bounding box image annotation. For each image, we obtained a list of local business
names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20
business results for each image, typically resulting in 50 unique words. To summarize, the
SVT data set consists of images collected from Google Street View, where each image is annotated
with bounding boxes around words from businesses around where the image was taken.

Citation

@inproceedings{wang2011end,
  title={End-to-end scene text recognition},
  author={Wang, Kai and Babenko, Boris and Belongie, Serge},
  booktitle={2011 International Conference on Computer Vision},
  pages={1457--1464},
  year={2011},
  organization={IEEE}
}
@inproceedings{wang2010word,
  title={Word spotting in the wild},
  author={Wang, Kai and Belongie, Serge},
  booktitle={European Conference on Computer Vision},
  pages={591--604},
  year={2010},
  organization={Springer}
}
Data Summary
Type
Image,
Amount
--
Size
112.78MB
Provided by
Department of Computer Science and Engineering, University of California, San Diego
The UCSD Department of Computer Science and Engineering (CSE) embodies the university's tradition of excellence as a world-class leader in computer science and engineering education and research
| Amount -- | Size 112.78MB
The Street View Text
2D Box
OCR/Text Detection
License: Unknown

Overview

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this
data exhibits high variability and often has low resolution. In dealing with outdoor street
level imagery, we note two characteristics. (1) Image text often comes from business signage
and (2) business names are easily available through geographic business searches. These factors
make the SVT set uniquely suited for word spotting in the wild: given a street view image,
the goal is to identify words from nearby businesses.

Data Collection

We used Amazon's Mechanical Turk to harvest and label the images from Google Street View.
To build the data set, we created several Human Intelligence Tasks (HITs) to be completed on
Mechanical Turk.

Workers are assigned a unique city and are requested to acquire 20 images
that contain text from Google Street view. They were instructed to: (1) perform a Search Nearby:*
on their city, (2) examine the businesses in the search results, and (3) look at the associated
street view for images containing text from the business name. If words are found, they compose
the scene to minimize skew, save a screen shot, and record the business name and address.

Data Annotation

Workers are presented with an image and a list of candidate words to label with bounding boxes.
This contrasts with the ICDAR Robust Reading data set in that we only label words associated
with businesses. We used Alex Sorokin's Annotation Toolkit
to support bounding box image annotation. For each image, we obtained a list of local business
names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20
business results for each image, typically resulting in 50 unique words. To summarize, the
SVT data set consists of images collected from Google Street View, where each image is annotated
with bounding boxes around words from businesses around where the image was taken.

Citation

@inproceedings{wang2011end,
  title={End-to-end scene text recognition},
  author={Wang, Kai and Babenko, Boris and Belongie, Serge},
  booktitle={2011 International Conference on Computer Vision},
  pages={1457--1464},
  year={2011},
  organization={IEEE}
}
@inproceedings{wang2010word,
  title={Word spotting in the wild},
  author={Wang, Kai and Belongie, Serge},
  booktitle={European Conference on Computer Vision},
  pages={591--604},
  year={2010},
  organization={Springer}
}
0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti