graviti
Products
Resources
About us
PANDA
2D Box
2D Box Tracking
Person
|Vehicle
|...
License: Custom

Overview

PANDA is the first gigaPixel-level humAN-centric video dataset, for large-scale, long-term,
and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and
cover real-world large-scale scenes with both wide field-of-view (~1km^2 area) and high
resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts
with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations,
including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k
trajectories, 2.2k groups and 2.9k interactions.

Data Annotation

File structure

For PANDA-Image, the training image and test image are stored in two compression packages respectively.
The directory after the decompression contains the folders of each scene named after the scene
name, each folder contains the pictures belonging to each scene.

For PANDA-Video, each video
sequence is stored in a separate compression package. The compressed folder is named after
the scene name and contains the frame images of the video sequences.

Annotation Formats

PANDA-Image

The two files human_bbox_train.json
and vehicle_bbox_train.json respectively contain the annotations of the pedestrians and vehicles
in the images for training set. human_bbox_test.json and vehicle_bbox_test.json only contain
image_filepath, image id and image size for testing set. Please note that for the results
on the test set to submit, the image id should be the same as in the annotation file.

human_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
If the object is a person:
object_dict{
        "category" : "person",
        "pose" : "standing" or "walking" or "sitting" or "riding" or "held" (a baby in the
arms) or "unsure",
        "riding type" : "bicycle rider" or "motorcycle rider" "tricycle
rider" or "null" (when "pose" is not "riding"),
        "age" : "adult" or "child" or "unsure",
        "rects" : rects,
}
rects{
        "head" : rect_dict,
        "visible body" : rect_dict,
        "full body" : rect_dict,
}
If this box is crowd/reflection/something like person/... and need to be ignore:
object_dict{
        "category" : "ignore" (someone who
is heavily occluded) or "fake person" or "crowd" (extremely dense crowd),
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "category" is
    the key that determines whether the target box is a pedestrian or a special area that needs
    to be ignored. A pedestrian can only be "person"
  • "riding type" is not "null" only if "category" is "riding"
  • "x" and "y" are floating point numbers
    between 0 and 1, representing the ratio of the coordinates to the width and height of the image,
    respectively

vehicle_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
object_dict{
        "category" : "small car" or "midsize car" or "large car" or "bicycle"
or "motorcycle" or "tricycle" or "electric car" or "baby carriage" or "vehicles" or "unsure",
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "vehicles"refers to a dense vehicle group and should be ignored
  • "small car", "midsize car" and "large car" belong to motor vehicles with four or more
    wheels and are distinguished by vehicle size. "electric car" refers to an electric sightseeing
    car or patrol car, etc.
  • "x" and "y" are floating point numbers between 0 and 1, representing
    the ratio of the coordinates to the width and height of the image, respectively

PANDA-Video

The annotation
files for each video sequence in PANDA-Video include two: tracks.json and seqinfo.json
respectively contain the pedestrian trajectory annotation and the basic information of the
video sequence. The annotation file for each video sequence is stored in a folder named after
the scene name.

tracks.json
JSON{
        [track_dict],
}
track_dict{
        "track id" : int,
        "frames" : [frame_dict],
}
frame_dict{
        "frame id" : int,
        "rect" : rect_dict,
        "face orientation" : "back" or "front" or "left" or "left back" or
"left front" or "right" or "right back" or "right front" or "unsure",
        "occlusion" : "normal" or "hide" or "serious hide" or "disappear",
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • Both "frame id" and "track id" count from 1
  • In "face orientation", "front" means facing the camera
  • In "occlusion",
    "normal" means the occlusion rate is less than 10%, "hide" means the occlusion rate is
    between 10% and 50%, "serious hide" means the occlusion rate is greater than 50%, and "disappear"
    means the object completely disappears
  • "x" and "y" are floating point numbers between
    0 and 1, representing the ratio of the coordinates to the width and height of the image, respectively
seqinfo.json
JSON{
        "name" : scene_name,
        "frameRate" : int,
        "seqLength" : int,
        "imWidth" : int,
        "imHeight" : int,
        "imExt" : file_extension,
        "imUrls" : [image_url]
}

Citation

Please use the following citation when referencing the dataset:

@inproceedings{wang2020panda,
  title={PANDA: A Gigapixel-level Human-centric Video Dataset},
  author={Wang, Xueyang and Zhang, Xiya and Zhu, Yinheng and Guo, Yuchen and Yuan, Xiaoyun
and Xiang, Liuyu and Wang, Zerun and Ding, Guiguang and Brady, David and Dai, Qionghai and
others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3268--3278},
  year={2020}
}

License

Custom

Data Summary
Type
Image,
Amount
--
Size
58.79GB
Provided by
Tsinghua University
Tsinghua University is a major research university in Beijing, and a member of the C9 League of Chinese universities. Since its establishment in 1911, it has graduated Chinese leaders in science, engineering, politics, business, academia, and culture.
| Amount -- | Size 58.79GB
PANDA
2D Box 2D Box Tracking
Person | Vehicle
License: Custom

Overview

PANDA is the first gigaPixel-level humAN-centric video dataset, for large-scale, long-term,
and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and
cover real-world large-scale scenes with both wide field-of-view (~1km^2 area) and high
resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts
with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations,
including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k
trajectories, 2.2k groups and 2.9k interactions.

Data Annotation

File structure

For PANDA-Image, the training image and test image are stored in two compression packages respectively.
The directory after the decompression contains the folders of each scene named after the scene
name, each folder contains the pictures belonging to each scene.

For PANDA-Video, each video
sequence is stored in a separate compression package. The compressed folder is named after
the scene name and contains the frame images of the video sequences.

Annotation Formats

PANDA-Image

The two files human_bbox_train.json
and vehicle_bbox_train.json respectively contain the annotations of the pedestrians and vehicles
in the images for training set. human_bbox_test.json and vehicle_bbox_test.json only contain
image_filepath, image id and image size for testing set. Please note that for the results
on the test set to submit, the image id should be the same as in the annotation file.

human_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
If the object is a person:
object_dict{
        "category" : "person",
        "pose" : "standing" or "walking" or "sitting" or "riding" or "held" (a baby in the
arms) or "unsure",
        "riding type" : "bicycle rider" or "motorcycle rider" "tricycle
rider" or "null" (when "pose" is not "riding"),
        "age" : "adult" or "child" or "unsure",
        "rects" : rects,
}
rects{
        "head" : rect_dict,
        "visible body" : rect_dict,
        "full body" : rect_dict,
}
If this box is crowd/reflection/something like person/... and need to be ignore:
object_dict{
        "category" : "ignore" (someone who
is heavily occluded) or "fake person" or "crowd" (extremely dense crowd),
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "category" is
    the key that determines whether the target box is a pedestrian or a special area that needs
    to be ignored. A pedestrian can only be "person"
  • "riding type" is not "null" only if "category" is "riding"
  • "x" and "y" are floating point numbers
    between 0 and 1, representing the ratio of the coordinates to the width and height of the image,
    respectively

vehicle_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
object_dict{
        "category" : "small car" or "midsize car" or "large car" or "bicycle"
or "motorcycle" or "tricycle" or "electric car" or "baby carriage" or "vehicles" or "unsure",
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "vehicles"refers to a dense vehicle group and should be ignored
  • "small car", "midsize car" and "large car" belong to motor vehicles with four or more
    wheels and are distinguished by vehicle size. "electric car" refers to an electric sightseeing
    car or patrol car, etc.
  • "x" and "y" are floating point numbers between 0 and 1, representing
    the ratio of the coordinates to the width and height of the image, respectively

PANDA-Video

The annotation
files for each video sequence in PANDA-Video include two: tracks.json and seqinfo.json
respectively contain the pedestrian trajectory annotation and the basic information of the
video sequence. The annotation file for each video sequence is stored in a folder named after
the scene name.

tracks.json
JSON{
        [track_dict],
}
track_dict{
        "track id" : int,
        "frames" : [frame_dict],
}
frame_dict{
        "frame id" : int,
        "rect" : rect_dict,
        "face orientation" : "back" or "front" or "left" or "left back" or
"left front" or "right" or "right back" or "right front" or "unsure",
        "occlusion" : "normal" or "hide" or "serious hide" or "disappear",
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • Both "frame id" and "track id" count from 1
  • In "face orientation", "front" means facing the camera
  • In "occlusion",
    "normal" means the occlusion rate is less than 10%, "hide" means the occlusion rate is
    between 10% and 50%, "serious hide" means the occlusion rate is greater than 50%, and "disappear"
    means the object completely disappears
  • "x" and "y" are floating point numbers between
    0 and 1, representing the ratio of the coordinates to the width and height of the image, respectively
seqinfo.json
JSON{
        "name" : scene_name,
        "frameRate" : int,
        "seqLength" : int,
        "imWidth" : int,
        "imHeight" : int,
        "imExt" : file_extension,
        "imUrls" : [image_url]
}

Citation

Please use the following citation when referencing the dataset:

@inproceedings{wang2020panda,
  title={PANDA: A Gigapixel-level Human-centric Video Dataset},
  author={Wang, Xueyang and Zhang, Xiya and Zhu, Yinheng and Guo, Yuchen and Yuan, Xiaoyun
and Xiang, Liuyu and Wang, Zerun and Ding, Guiguang and Brady, David and Dai, Qionghai and
others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3268--3278},
  year={2020}
}

License

Custom

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti