Motion Reconstruction and Imitation from Monocular Videos

We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured “in- the-wild” video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as a contact sequence that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion-capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain videos, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation).

Related Papers

PDF SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos
John Zhang, Shuo Yang, Gengshan Yang, Arun Bishop, Swaminathan Gurumurthy, Deva Ramanan, and Zac Manchester
Robotics and Automation Letters (RA-L) & International Conference on Robotics and Automation (ICRA)
PDF PPR: Physically Plausible Reconstruction from Monocular Videos
Gengshan Yang, Shuo Yang, John Zhang, Zac Manchester, and Deva Ramanan
IEEE International Conference on Computer Vision (ICCV). Paris, France.


John Zhang
Contact-rich Simulation and Control
Shuo Yang
Legged Robots State Estimation, Mapping and Control
Arun Bishop
Contact-rich Optimization and Control
Swaminathan Gurumurthy
Deep Equilibrium Models
Zac Manchester
Assistant Professor
Last updated: 2023-08-26