Nashville Time (CDT)

Start End Event
12 June 2025 08:50:00 CDT 12 June 2025 09:00:00 CDT

Opening Remarks

Workshop Organizers
12 June 2025 09:00:00 CDT 12 June 2025 09:30:00 CDT
Federica Bogo
Meta Reality Labs

Understanding Human Motion in the Wild

Machine perception of human motion is crucial in many areas, from Robotics to Mixed Reality. However, developing perception algorithms that work in the wild is challenging, due to the complexity of human behavior and the difficulty of acquiring 3D data at scale. In this talk, we will focus on how to tackle these challenges looking at recent advances in computer vision and machine learning. First, we will look at robust motion reconstruction algorithms that, taking just monocular RGB(-D) videos as input, can accurately capture human motion even in the presence of noise and occlusions. Second, we will discuss the importance of capturing diverse human motion data at scale, introducing the Nymeria dataset. Nymeria is currently the largest multi-modal dataset for in-the-wild human motion, providing both ground-truth 3D pose annotations and motion-language descriptions. Finally, we will show how we can leverage this data to effectively reason about human motion, connecting it to natural language. We will present EgoLM, a recent multi-modal framework that leverages Large Language Models to simultaneously track and understand human motion in egocentric scenarios.

12 June 2025 09:30:00 CDT 12 June 2025 10:00:00 CDT
Li Yi
Tsinghua University

Learning Versatile Humanoid-Scene Interaction Skills from Human Motion

Equipping humanoid robots with interactive capabilities across a wide range of scenarios is a central objective in embodied artificial intelligence. However, the process of skill acquisition in humanoid robots is challenging due to their complex dynamics, high-dimensional perception and control demands, and underactuated nature. Fortunately, the morphological similarity between humanoid robots and humans offers a unique advantage: the vast repository of human interaction motion data serves as a valuable source of prior knowledge. This talk focuses on how to efficiently utilize this data to develop diverse interaction skills in humanoid robots. I will present three ways to leverage human motion data: learning through trial and error with human motion priors, learning by tracking human interactions, and learning by interacting with digital humans. These methods highlight the transformative potential of human motion data in advancing humanoid skills.

12 June 2025 10:00:00 CDT 12 June 2025 10:30:00 CDT

Coffee Break & Poster Session

12 June 2025 10:30:00 CDT 12 June 2025 11:00:00 CDT

Presentations of the RHOBIN Challenge Results Given by the Winners

12 June 2025 11:00:00 CDT 12 June 2025 11:30:00 CDT
Angela Yao
National University of Singapore

From Hands to Feet: Contact-Driven Modelling of Human-Object-World Interactions

Understanding how humans interact with objects and the physical world is fundamental to modeling everyday behavior in 3D. This talk focuses on contacts, between hands and objects, and between the feet and the world, as driving cues for 3D reconstruction and generation. First, we will look at natural language-based generation of hand-object contacts, to improve hand-object reconstruction and generation. Language provides a natural way of specifying I will then introduce our state-of-the-art approach on world-coordinate human mesh recovery based on inferred contacts such as the feet.

12 June 2025 11:30:00 CDT 12 June 2025 12:00:00 CDT
Jiajun Wu
Stanford University

Generative Reconstruction of Human-Object Interactions

There are limited data available for models to learn to reconstruct human-object interactions. To unleash the power of data-driven models, it is natural to explore ways to connect HOI with modalities where data are abundant, including images, videos, and language, where pre-trained visual generative models can serve as a rich prior for reconstruction. In this talk, I will present a few of our recent works that follow this paradigm of generative reconstruction, recovering 4D HOI from language descriptions, third-person videos, and even a static frame.

12 June 2025 12:00:00 CDT 12 June 2025 12:30:00 CDT

Panel Discussion

Moderation: Xianghui Xie
Participants: Federica Bogo, Jiajun Wu, Angela Yao

Contact Info

E-mail: rhobinchallenge@gmail.com