graviti
Products
Resources
About us
NEAR
Pose
AR/MR
|...
License: Unknown

Overview

The existing datasets for evaluating Visual Inertial Odometry (VIO) have boosted the research
of autonomous agents, but they don’t meet the prosperous research of Augmented Reality (AR)
or Mixed Reality (MR) given that they are not collected at real AR scenes and do not account
for affecting factors of mobile devices. This paper presents the NEAR dataset, an AR oriented
visual-inertial dataset collected with commodity handheld phones with ground truthrts.
The dataset has a total of 113 sequences in 49 elaborately designed collection cases at two
typical indoor scenes, i.e. the living area and the table area. It also covers plenty of setting
adjustments for comparison, including the comparisons of different level textures, illuminations,
motion patterns, camera settings and the difference between the rolling shutter and the global
shutter.

Data Annotation

To enable evaluation of VIO on the NEAR dataset, we also provide the calibration parameters
along with the dataset. Here, we brief the calibration procedure of the intrinsics and extrinsics.

Intrinsic Calibration

The calibration sequences are recorded with the phone camera viewing towards a 12×8 chessboard
with a grid size of 50mm×50mm for camera intrinsics. Afterward, the sufficient high-quality
images are chosen empirically to feed to MATLAB calibration toolbox with the 4 parametric radial
and tangential distortion model. Then the intrinsics of all phone cameras and the MYNT camera
are calibrated with respective calibration sequences for once since the autofocus mode is turned
off during the complete data collection except for the comparison cases. As for the intrinsics
of IMU, we follow to calibrate the variances of noise, bias and random walk of both gyroscope
and accelerometer of each phone via Allan Variance Tool5.

Extrinsic Calibration

There are 3 moving coordinate frames
but relatively invariant with each other in data collection including the phone camera frame
C, the phone IMU frame B and the rig frame R. Related 2 fixed frames are the frame of the ChArUco
board6 W and the global frame of motion capture system G. The whole frame system is shown
in figure 3. Firstly, IMU-camera extrinsic i.e. Tbc and time offset were calibrated with Kalibr
7 . The camera-rig transformation is also needed since the motion capture system provides G
R T, but the VIO can only provide W C T. Fol lowing the identical transformation

截屏2020-09-24下午4.22.34

We design
a calibration sequence to get the relationship of transfor mation. First, the 6×8 ChArUco
board (square length of 48mm and marker length of 36mm) is set in the test fifield. Then
we move the rig slowly with the camera facing towards the board while recording images and
the ground truth G R Tj simultaneously. Afterward, the chessboard corners in every camera
images are detected and the camera poses W C Ti are computed with the PnP solver. Finally,
we form the transformation issue as the least square problem

截屏2020-09-24下午4.22.34

where j(i) is the corresponding
index of i aligned on the times tamp. We solve this optimization problem with the alternate
iteration method. In every iteration, we have two steps. Firstly, we solve the best G W T
with the fifixed R C T from the last iteration or prior identity SE(3) according to Umeyama
method [16]. Secondly, we fifix the ‘best’ G W T and solve R C T by a similar approach with
the fifirst step. The iteration is terminated until the error converges.

Data Format

The average precision rate and F-score are used for evaluation. A detection is correct only
if the intersection of unit (IoU) between the detected bounding box and any of the ground truth
box with the same class is larger than 0.33. F-score is calculated as: F-score=2PR/(P+R), where
P and R is the precision and recall rate. Notice that F-score is threshold-sensitive, which
means you could adjust your score threshold to obtain a better result. Although F-score is
not as fair as the mAP criteria but more practical since a threshold should always be given
when deploying the model and not all of the algorithms have a score evaluation for the target.
Thus, F-score and mAP are both under consideration in the benchmarks.

The evaluation script
for mAP and F-score are borrowed from Icdar2015 evaluation scripts
with small modification (You may first register an account.). Here, we give the modified evaluation
scripts and the ground truth gt.zip file of the test set in evaluation/ directory. You
can evaluate your own method by following instructions:

  • run your algorithm and save the
    detect result for each image named as image_name.txt, where the image_name should be exactly
    the same as in the gt.zip. You should follow the format of evaluation/gt.zipexcept that
    the output description of each defect from your algorithm should be: x1,y1,x2,y2,confidence,type
    , where (x1,y1) and (x2,y2)* is the top left and the bottom right corner of the
    bounding box of the defect. confidence is a float number to show how confident you believe
    such detection result. type is a string and should be one of the following: open,short,mousebite,spur,copper,pin-hole.
    Notice there is no space except the comma.
  • zip your .txt file to res.zip.
    (You should not contain any sub-directory in the res.zip file)
  • run the evaluation script: python script.py -s=res.zip -g=gt.zip
Data Summary
Type
IMU, Image,
Amount
553.7K
Size
174.15GB
Provided by
Shanghai Beidou Research Institute
| Amount 553.7K | Size 174.15GB
NEAR
Pose
AR/MR
License: Unknown

Overview

The existing datasets for evaluating Visual Inertial Odometry (VIO) have boosted the research
of autonomous agents, but they don’t meet the prosperous research of Augmented Reality (AR)
or Mixed Reality (MR) given that they are not collected at real AR scenes and do not account
for affecting factors of mobile devices. This paper presents the NEAR dataset, an AR oriented
visual-inertial dataset collected with commodity handheld phones with ground truthrts.
The dataset has a total of 113 sequences in 49 elaborately designed collection cases at two
typical indoor scenes, i.e. the living area and the table area. It also covers plenty of setting
adjustments for comparison, including the comparisons of different level textures, illuminations,
motion patterns, camera settings and the difference between the rolling shutter and the global
shutter.

Data Annotation

To enable evaluation of VIO on the NEAR dataset, we also provide the calibration parameters
along with the dataset. Here, we brief the calibration procedure of the intrinsics and extrinsics.

Intrinsic Calibration

The calibration sequences are recorded with the phone camera viewing towards a 12×8 chessboard
with a grid size of 50mm×50mm for camera intrinsics. Afterward, the sufficient high-quality
images are chosen empirically to feed to MATLAB calibration toolbox with the 4 parametric radial
and tangential distortion model. Then the intrinsics of all phone cameras and the MYNT camera
are calibrated with respective calibration sequences for once since the autofocus mode is turned
off during the complete data collection except for the comparison cases. As for the intrinsics
of IMU, we follow to calibrate the variances of noise, bias and random walk of both gyroscope
and accelerometer of each phone via Allan Variance Tool5.

Extrinsic Calibration

There are 3 moving coordinate frames
but relatively invariant with each other in data collection including the phone camera frame
C, the phone IMU frame B and the rig frame R. Related 2 fixed frames are the frame of the ChArUco
board6 W and the global frame of motion capture system G. The whole frame system is shown
in figure 3. Firstly, IMU-camera extrinsic i.e. Tbc and time offset were calibrated with Kalibr
7 . The camera-rig transformation is also needed since the motion capture system provides G
R T, but the VIO can only provide W C T. Fol lowing the identical transformation

截屏2020-09-24下午4.22.34

We design
a calibration sequence to get the relationship of transfor mation. First, the 6×8 ChArUco
board (square length of 48mm and marker length of 36mm) is set in the test fifield. Then
we move the rig slowly with the camera facing towards the board while recording images and
the ground truth G R Tj simultaneously. Afterward, the chessboard corners in every camera
images are detected and the camera poses W C Ti are computed with the PnP solver. Finally,
we form the transformation issue as the least square problem

截屏2020-09-24下午4.22.34

where j(i) is the corresponding
index of i aligned on the times tamp. We solve this optimization problem with the alternate
iteration method. In every iteration, we have two steps. Firstly, we solve the best G W T
with the fifixed R C T from the last iteration or prior identity SE(3) according to Umeyama
method [16]. Secondly, we fifix the ‘best’ G W T and solve R C T by a similar approach with
the fifirst step. The iteration is terminated until the error converges.

Data Format

The average precision rate and F-score are used for evaluation. A detection is correct only
if the intersection of unit (IoU) between the detected bounding box and any of the ground truth
box with the same class is larger than 0.33. F-score is calculated as: F-score=2PR/(P+R), where
P and R is the precision and recall rate. Notice that F-score is threshold-sensitive, which
means you could adjust your score threshold to obtain a better result. Although F-score is
not as fair as the mAP criteria but more practical since a threshold should always be given
when deploying the model and not all of the algorithms have a score evaluation for the target.
Thus, F-score and mAP are both under consideration in the benchmarks.

The evaluation script
for mAP and F-score are borrowed from Icdar2015 evaluation scripts
with small modification (You may first register an account.). Here, we give the modified evaluation
scripts and the ground truth gt.zip file of the test set in evaluation/ directory. You
can evaluate your own method by following instructions:

  • run your algorithm and save the
    detect result for each image named as image_name.txt, where the image_name should be exactly
    the same as in the gt.zip. You should follow the format of evaluation/gt.zipexcept that
    the output description of each defect from your algorithm should be: x1,y1,x2,y2,confidence,type
    , where (x1,y1) and (x2,y2)* is the top left and the bottom right corner of the
    bounding box of the defect. confidence is a float number to show how confident you believe
    such detection result. type is a string and should be one of the following: open,short,mousebite,spur,copper,pin-hole.
    Notice there is no space except the comma.
  • zip your .txt file to res.zip.
    (You should not contain any sub-directory in the res.zip file)
  • run the evaluation script: python script.py -s=res.zip -g=gt.zip
0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti