graviti
Products
Resources
About us
OTW
2D Box
Classification
Action/Event Detection
|...
License: CC BY 4.0

Overview

The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances
of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. These videos
are crowdsourced from workers on the Amazon Mechanical Turk using a novel scenario acting strategy,
which collects multiple instances of natural activities per scenario. Turkers are instructed
to lean their mobile device against an upper story window overlooking an outdoor space, walk
outside to perform a scenario involving people, vehicles and objects, and finally upload the
video to us for annotation. Performance evaluation for activity classification on VIRAT Ground
2.0 shows that the OTW dataset provides an 8.3% improvement in mean classification accuracy,
and a 12.5% improvement on the most challenging activities involving people with vehicles.

Data Annotation

Annotation files are CSV format with schema:

[Video ID, Activity ID, Actor ID, Activity or Object Type, Frame Number, XMin, YMin, XMax,
YMax, Labeled]

  • Video ID: A globally unique ID assigned to each video for each dataset. Each
    Homes video is located in homes/video/VIDEO_ID.mp4. Each Lots video is located in lots/video/VIDEO_ID.mp4
  • Activity ID: A unique ID assigned to each activity within a specific dataset (lots or homes).
  • Actor ID: A unique ID assigned to each actor. If we are not sure of the actor, this will
    be None.
  • Activity Type: A label for the activity or object in the annotation
  • Frame Number: The frame number of the annotation. Frame numbers correspond to the
    output of extract_frames.py
  • XMin, YMin, XMax, YMax: The bounding box of the annotation defining
    the upper left corner (XMin, YMin) and bottom right corner (XMax, YMax) in image coordinates,
    where X=column index, Y=row index in image coordinates.
  • Labeled: A boolean indicating whether
    or not a frame was Human Labeled (True) or Interpolated (False). We used a combination of tracking
    and linear interpolation to generate bounding boxes in between the start and end frames of
    annotation by a human annotator.

Example annotations for a single activity from ./homes/annotations.csv:

00000,0,00038,dismounting bike,252,82,1165,255,1586,True
00000,0,00038,person,252,85,1165,211,1446,True
00000,0,00038,bicycle,253,103,1230,250,1458,False
00000,0,00038,dismounting bike,253,85,1165,250,1458,False
00000,0,00038,person,253,85,1165,211,1446,False
00000,0,00038,bicycle,254,103,1229,254,1455,False
00000,0,00038,dismounting bike,254,85,1165,254,1455,False
00000,0,00038,person,254,85,1165,211,1446,False
00000,0,00038,bicycle,255,102,1226,255,1449,False
00000,0,00038,dismounting bike,255,85,1165,255,1449,False
00000,0,00038,person,255,85,1165,211,1446,False

Frames are 0-indexed. Given an Activity Directory (homes
or lots) and a Frame Number, the python snippet for the absolute path of an frame image filename
is:

img_file_name = os.path.join(".", "homes", "%08d.jpg" % framenum)

Instruction

OTW to DIVA annotation

d_otw_to_diva.json is a JSON dictionary that maps OTW label string to their equivalent DIVA
label string. For example:

OTW : DIVA
"carrying (large)":"transport_heavycarry",
"pushing cart":"pull"

Frame extraction

A python3.x script for extracting frames from either the homes or lots datasets.

Usage:

pip3 install imageio imageio-ffmpeg
python3 extract_frames.py homes 10 5
python3 extract_frames.py lots

This export will take a while, and will extract frames to ./homes/frames.
This will parallelize the extraction over 10 workers (optionally can be increased or decreased)
with the PNG compression level of 5 (1=worst, 9=best).

Citation

Please use the following citation when referencing the dataset:

@article{Castan2019OutTW,
  title={Out the Window: A Crowd-Sourced Dataset for Activity Classification in Surveillance
Video},
  author={Greg Casta{\~n}{\'o}n and Nathan Shnidman and T. Anderson and J. Byrne},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.10899}
}

License

CC BY 4.0

Data Summary
Type
Video,
Amount
898
Size
44.81GB
Provided by
SYSTEMS & TECHNOLOGY RESEARCH
STR specializes in advanced research and development for defense, intelligence, and homeland security applications, developing and delivering innovative sensors and information processing capabilities.
| Amount 898 | Size 44.81GB
OTW
2D Box Classification
Action/Event Detection
License: CC BY 4.0

Overview

The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances
of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. These videos
are crowdsourced from workers on the Amazon Mechanical Turk using a novel scenario acting strategy,
which collects multiple instances of natural activities per scenario. Turkers are instructed
to lean their mobile device against an upper story window overlooking an outdoor space, walk
outside to perform a scenario involving people, vehicles and objects, and finally upload the
video to us for annotation. Performance evaluation for activity classification on VIRAT Ground
2.0 shows that the OTW dataset provides an 8.3% improvement in mean classification accuracy,
and a 12.5% improvement on the most challenging activities involving people with vehicles.

Data Annotation

Annotation files are CSV format with schema:

[Video ID, Activity ID, Actor ID, Activity or Object Type, Frame Number, XMin, YMin, XMax,
YMax, Labeled]

  • Video ID: A globally unique ID assigned to each video for each dataset. Each
    Homes video is located in homes/video/VIDEO_ID.mp4. Each Lots video is located in lots/video/VIDEO_ID.mp4
  • Activity ID: A unique ID assigned to each activity within a specific dataset (lots or homes).
  • Actor ID: A unique ID assigned to each actor. If we are not sure of the actor, this will
    be None.
  • Activity Type: A label for the activity or object in the annotation
  • Frame Number: The frame number of the annotation. Frame numbers correspond to the
    output of extract_frames.py
  • XMin, YMin, XMax, YMax: The bounding box of the annotation defining
    the upper left corner (XMin, YMin) and bottom right corner (XMax, YMax) in image coordinates,
    where X=column index, Y=row index in image coordinates.
  • Labeled: A boolean indicating whether
    or not a frame was Human Labeled (True) or Interpolated (False). We used a combination of tracking
    and linear interpolation to generate bounding boxes in between the start and end frames of
    annotation by a human annotator.

Example annotations for a single activity from ./homes/annotations.csv:

00000,0,00038,dismounting bike,252,82,1165,255,1586,True
00000,0,00038,person,252,85,1165,211,1446,True
00000,0,00038,bicycle,253,103,1230,250,1458,False
00000,0,00038,dismounting bike,253,85,1165,250,1458,False
00000,0,00038,person,253,85,1165,211,1446,False
00000,0,00038,bicycle,254,103,1229,254,1455,False
00000,0,00038,dismounting bike,254,85,1165,254,1455,False
00000,0,00038,person,254,85,1165,211,1446,False
00000,0,00038,bicycle,255,102,1226,255,1449,False
00000,0,00038,dismounting bike,255,85,1165,255,1449,False
00000,0,00038,person,255,85,1165,211,1446,False

Frames are 0-indexed. Given an Activity Directory (homes
or lots) and a Frame Number, the python snippet for the absolute path of an frame image filename
is:

img_file_name = os.path.join(".", "homes", "%08d.jpg" % framenum)

Instruction

OTW to DIVA annotation

d_otw_to_diva.json is a JSON dictionary that maps OTW label string to their equivalent DIVA
label string. For example:

OTW : DIVA
"carrying (large)":"transport_heavycarry",
"pushing cart":"pull"

Frame extraction

A python3.x script for extracting frames from either the homes or lots datasets.

Usage:

pip3 install imageio imageio-ffmpeg
python3 extract_frames.py homes 10 5
python3 extract_frames.py lots

This export will take a while, and will extract frames to ./homes/frames.
This will parallelize the extraction over 10 workers (optionally can be increased or decreased)
with the PNG compression level of 5 (1=worst, 9=best).

Citation

Please use the following citation when referencing the dataset:

@article{Castan2019OutTW,
  title={Out the Window: A Crowd-Sourced Dataset for Activity Classification in Surveillance
Video},
  author={Greg Casta{\~n}{\'o}n and Nathan Shnidman and T. Anderson and J. Byrne},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.10899}
}

License

CC BY 4.0

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti