Jargon Explained

  1. HMR: Human Mesh Reconstruction
  2. HMR 2.0 is reconstruction from a video, i.e.: we import the time sequence knowledge
  3. While for HMR 1.0 we only have one picture

Background Knowledge Explained

  1. This paper is build on top of a ViTPose paper which is the very baseline of a pose estimation using the ViT. One can think it is a ViT based PoseNet. This the link to that project pape:

https://github.com/ViTAE-Transformer/ViTPose

  1. This paper also build on top of PHALP:

https://openaccess.thecvf.com/content/CVPR2022/papers/Rajasegaran_Tracking_People_by_Predicting_3D_Appearance_Location_and_Pose_CVPR_2022_paper.pdf

and here is the project page:

http://people.eecs.berkeley.edu/~jathushan/PHALP/

Tracking People by Predicting 3D Appearance, Location and Pose

Problem Definition

In this paper, the problem is to recovering the 3D meshes of human bodies from single images, and tracking them over time in video.