2D Keypoints
3D Model
Pose Estimation
License: Custom


The dataset containing images with ground-truth shape for bodies, hands and faces together.

Data Collection

We begin with the SMPL+H dataset [52], obtaining one full body RGB image per frame. We then align SMPL-X to the 4D scans following [68]. An expert annotator manually curated the dataset to select 100 frames that can be confidently considered pseudo ground-truth, according to alignment quality and interesting hand poses and facial expressions. The pseudo ground-truth meshes allow to use a stricter vertexto-vertex (v2v) error metric [48, 62], in contrast to the common paradigm of reporting 3D joint error, which does not capture surface errors and rotations along the bones.'

[48]: Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael
J. Black. SMPL: A skinned multiperson linear model. ACM Transactions on Graphics, (Proc. SIGGRAPH
Asia), 34(6):248:1–248:16, Oct. 2015. 2, 3, 6
[52]: MANO, models SMPL+H website: dataset, and code. 3, 6
[62]: Georgios Pavlakos, Luyang Zhu, Xiaowei
Zhou, and Kostas Daniilidis. Learning to estimate 3D human pose and shape from a single color
image. In CVPR, 2018. 1, 2, 3, 6
[68]: Javier Romero, Dimitrios Tzionas, and Michael J Black.
Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics
(TOG), 2017. 2, 3, 4, 5, 6


  title = {Expressive Body Capture: 3D Hands, Face, and Body from a Single Image},
  author = {Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo
and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J.},
  booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
  year = {2019}



Data Summary
Provided by
Perceiving Systems
We combine research on computer vision, computer graphics, and machine learning to teach computers to see and understand humans and their behavior.
Start Building AI Now