graviti
Products
Resources
About us
ADE20K
2D Polygon
Common
|...
License: Unknown

Overview

The ADE20K dataset covers a wide range of scenes and object categories with dense and detailed
annotations for scene parsing, spanning diverse annotations of scenes, objects, parts of objects,
and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with
150 object and stuff classes included. Several segmentation baseline models are evaluated on
the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse
a scene into stuff, objects, and object parts in a cascade and improve over the baselines.
We further show that the trained scene parsing networks can lead to applications such as image
content removal and scene synthesis.

Data Collection

Each folder contains images separated by scene category (same scene categories than the Places
Database). For each image, the object and part segmentations are stored in two different png
files. All object and part instances are annotated sparately.

For each image there are the following files:

*.jpg: RGB image.

*_seg.png: object segmentation
mask. This image contains information about the object class segmentation masks and also separates
each class into instances. The channels R and G encode the objects class masks. The channel
B encodes the instance object masks. The function loadAde20K.m extracts both masks.

*_seg_parts_N.png:
parts segmentation mask, where N is a number (1,2,3,...) indicating the level in the part hierarchy.
Parts are organized in a tree where objects are composed of parts, and parts can be composed
of parts too, and parts of parts can have parts too. The level N indicates the depth in the
part tree. Level N=1 corresponds to parts of objects. All the part segmentations have the same
encoding as in the object segmentation masks, classes are coded in the RG channels and instances
in the B channel. Use the function loadAde20K.m to extract part segmentation mask and to
separate instances of the same class.

*_.txt: text file describing the content of each
image (describing objects and parts). This information is redundant with other files. But in
addition contains also information about object attributes. The function loadAde20K.m also
parses the content of this file. Each line in the text file contains: column 1=instance number,
column 2=part level (0 for objects), column 3=occluded (1 for true), column 4=class name (parsed
using wordnet), column 5=original raw name (might provide a more detailed categorization),
column 6=comma separated attributes list.

Data Annotation

The following example has two part levels. The first segmentation shows the object masks. The
second segmentation corresponds to object parts (body parts, mug parts, table parts, ...).
The third segmentation shows parts of the heads (eyes, mouth, nose, ...):

img
img
img
img
img

Matlab file: index_ade20k_2015.mat

filename: cell array of length N=22210 with the image file names.

folder: cell array of length N with the image folder names.

scene: cell array of length N providing the scene name (same
classes as the Places database) for each image.

objectnames: cell array of length C with the object class names.

wordnet_found: array of length C. It indicates if the objectname was found in Wordnet.

wordnet_hypernym: cell array of length C. WordNet hypernyms for each object name.

wordnet_gloss: cell array of length C. WordNet definition.

objectcounts: array of length C with the number of instances for each object class.

objectPresence: array of size [length
C, N] with the object counts per image. objectPresence(c,i)=n if in image i there are n instances
of object class c.

objectIsPart: array of size [length C, N] counting how many times an
object is a part in each image. objectIsPart(c,i)=m if in image i object class c is a part
of another object m times. For objects, objectIsPart(c,i)=0, and for parts we will find: objectIsPart(c,i)
ā‰ˆ objectPresence(c,i).

proportionClassIsPart: array of length C with the proportion of
times that class c behaves as a part. If proportionClassIsPart(c)=0 then it means that this
is a main object (e.g., car, chair, ...). See bellow for a discussion on the utility of this
variable.

Citation

@inproceedings{zhou2017scene,
  title={Scene parsing through ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela
and Torralba, Antonio},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={633--641},
  year={2017}
}
Data Summary
Type
Image,
Amount
--
Size
3.77GB
Provided by
MIT Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts. The institute is a land-grant, sea-grant, and space-grant university, with an urban campus that extends more than a mile (1.6 km) alongside the Charles River.
| Amount -- | Size 3.77GB
ADE20K
2D Polygon
Common
License: Unknown

Overview

The ADE20K dataset covers a wide range of scenes and object categories with dense and detailed
annotations for scene parsing, spanning diverse annotations of scenes, objects, parts of objects,
and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with
150 object and stuff classes included. Several segmentation baseline models are evaluated on
the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse
a scene into stuff, objects, and object parts in a cascade and improve over the baselines.
We further show that the trained scene parsing networks can lead to applications such as image
content removal and scene synthesis.

Data Collection

Each folder contains images separated by scene category (same scene categories than the Places
Database). For each image, the object and part segmentations are stored in two different png
files. All object and part instances are annotated sparately.

For each image there are the following files:

*.jpg: RGB image.

*_seg.png: object segmentation
mask. This image contains information about the object class segmentation masks and also separates
each class into instances. The channels R and G encode the objects class masks. The channel
B encodes the instance object masks. The function loadAde20K.m extracts both masks.

*_seg_parts_N.png:
parts segmentation mask, where N is a number (1,2,3,...) indicating the level in the part hierarchy.
Parts are organized in a tree where objects are composed of parts, and parts can be composed
of parts too, and parts of parts can have parts too. The level N indicates the depth in the
part tree. Level N=1 corresponds to parts of objects. All the part segmentations have the same
encoding as in the object segmentation masks, classes are coded in the RG channels and instances
in the B channel. Use the function loadAde20K.m to extract part segmentation mask and to
separate instances of the same class.

*_.txt: text file describing the content of each
image (describing objects and parts). This information is redundant with other files. But in
addition contains also information about object attributes. The function loadAde20K.m also
parses the content of this file. Each line in the text file contains: column 1=instance number,
column 2=part level (0 for objects), column 3=occluded (1 for true), column 4=class name (parsed
using wordnet), column 5=original raw name (might provide a more detailed categorization),
column 6=comma separated attributes list.

Data Annotation

The following example has two part levels. The first segmentation shows the object masks. The
second segmentation corresponds to object parts (body parts, mug parts, table parts, ...).
The third segmentation shows parts of the heads (eyes, mouth, nose, ...):

img
img
img
img
img

Matlab file: index_ade20k_2015.mat

filename: cell array of length N=22210 with the image file names.

folder: cell array of length N with the image folder names.

scene: cell array of length N providing the scene name (same
classes as the Places database) for each image.

objectnames: cell array of length C with the object class names.

wordnet_found: array of length C. It indicates if the objectname was found in Wordnet.

wordnet_hypernym: cell array of length C. WordNet hypernyms for each object name.

wordnet_gloss: cell array of length C. WordNet definition.

objectcounts: array of length C with the number of instances for each object class.

objectPresence: array of size [length
C, N] with the object counts per image. objectPresence(c,i)=n if in image i there are n instances
of object class c.

objectIsPart: array of size [length C, N] counting how many times an
object is a part in each image. objectIsPart(c,i)=m if in image i object class c is a part
of another object m times. For objects, objectIsPart(c,i)=0, and for parts we will find: objectIsPart(c,i)
ā‰ˆ objectPresence(c,i).

proportionClassIsPart: array of length C with the proportion of
times that class c behaves as a part. If proportionClassIsPart(c)=0 then it means that this
is a main object (e.g., car, chair, ...). See bellow for a discussion on the utility of this
variable.

Citation

@inproceedings{zhou2017scene,
  title={Scene parsing through ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela
and Torralba, Antonio},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={633--641},
  year={2017}
}
0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti