Tracking People by Predicting 3D Appearance, Location and Pose

Pipeline

Untitled

In their words, when we first observed the person in the 2D frame, we will lift them to a 3D space
The so called 3D space is actually the 3D dynamic model. After hat, we want to construct a 3D dynamic model by actually connecting single frame result. When we have the time sequence result, we than can have a prediction according to the time sequence
The 3D dynamic model can infer person’s location and appearance at the next time say t+1.
Then we use the inferred result to do the association among multiple person

Lifting Procedure

This paper, author used HMAR model. This is applied on every detected bounding box of the input video. Therefore, they also used some object detection strategy such as YOLO(I guess). This HMAR give initial, single frame, and location observations for 3D pose.

Pipeline

Lifting Procedure

Build Dynamic Model (Used for predicting future Steps)