The first Workshop
Reconstruction of Human-Object Interactions (RHOBIN)



rhobin-logo

June 19 @ CVPR 2023 Vancouver, Canada
East 18@Vancouver Convention Center


  • Home

Vancouver Time (PST)

Start End Event
19 June 2023 08:20:00 PST 19 June 2023 08:30:00 PST

Welcome and Introductions

Workshop Organizers
19 June 2023 08:30:00 PST 19 June 2023 09:00:00 PST
Josef Sivic
(CIIRC, CTU)

Learning manipulation skills from instructional videos

People easily learn how to change a flat tire of a car or perform resuscitation by observing other people doing the same task, for example, in an instructional video. This involves advanced visual intelligence abilities such as interpreting sequences of human actions that manipulate objects to achieve a specific task. Currently, however, there is no artificial system with a similar level of cognitive visual competence. In this talk, I will describe our recent progress on learning from instructional videos. In the first part, I will focus on learning video and language representations from the input videos and transcribed narrations. In the second part, I will focus on learning from instructional videos about how people manipulate objects and demonstrate transferring the learned skill to a robotic manipulator.

19 June 2023 09:00:00 PST 19 June 2023 09:30:00 PST
Torsten Sattler
(CIIRC, CTU)

Recent advances in visual localization

In order to take scene context into account when modelling human-object interactions, it is important to know where in the scene the human is. One way of obtaining this information is via visual localization. Visual localization is the problem of estimating the position and orientation of a camera with respect to some (3D) representation of the scene. Localization plays an important role in many applications of computer vision, including augmented / mixed / virtual reality (and thus the meta-verse), and autonomous robots such as self-driving cars and drones. Starting by motivating the use of visual localization for human-object interactions, we will discuss recent advances in visual localization. The talk will focus on two main questions: Assuming that (part of) the visual localization process will happen via a service in the cloud (e.g., using services provided by companies such as Microsoft, Google, or Niantic), what are the implications on privacy of using such a service? How well do current state-of-the-art algorithms solve the visual localization problem (in terms of localization accuracy, memory consumption, and run-time)?

19 June 2023 09:30:00 PST 19 June 2023 10:00:00 PST

Presentations of the Rhobin Challenge results given by the winners

19 June 2023 10:00:00 PST 19 June 2023 10:30:00 PST

Coffee Break & Posters

19 June 2023 10:30:00 PST 19 June 2023 11:00:00 PST
Kristen Grauman
(University of Texas Austin and Meta AI Research)

Human-object interaction in first-person video

19 June 2023 11:00:00 PST 19 June 2023 11:30:00 PST
Siyu Tang
(ETH, Zurich)

Synthesizing human motions in 3D scenes.

Abstract: Simulating human behavior and interactions within various environments is crucial for numerous applications, including generating training data for machine learning algorithms, creating autonomous agents for interactive applications like augmented and virtual reality (AR/VR) or computer games, guiding architectural design decisions, and more. In this talk, I will discuss our previous and ongoing research efforts dedicated to modeling and synthesizing digital humans, with the ultimate goal of enabling them to exhibit spontaneous behavior and move autonomously in a digital environment. A key aspect of our work, which I will highlight during the talk, is the development of the Guided Motion Diffusion model. This approach generates high-quality and diverse human motion based on textual prompts and spatial constraints, such as motion trajectories and obstacles. Through a detailed exploration of our research, I will illustrate how these techniques can be applied to various scenarios, ultimately enriching and enhancing the realism of digital human behaviors across multiple domains.

19 June 2023 11:30:00 PST 19 June 2023 12:00:00 PST
Edward H. Adelson
(MIT)

The rich world of camera-based tactile sensing

There is a growing interest in the use of camera-based tactile sensors such as the GelSight sensors we are developing in my laboratory. A small internal camera looks through a clear gel at an opaque skin. Computer vision is used to measure the shape of the skin as it deforms due to contact with objects in the world. It is possible to capture the full x,y,z deformation of all the points in the contact patch, and this gives a complete picture of the mechanical interaction between the sensor and the world. From this you can infer many things, such as object shape, object pose, force, torque, slip, and material properties such as hardness or roughness. The tactile information can also be used in a feedback loop for reactive control of robotic tasks.

Accepted Papers Talks

Contact Info

E-mail: rhobinchallenge@gmail.com

Acknowledgements

Website template borrowed from: https://futurecv.github.io/ (Thanks to Deepak Pathak)