Social and Predictive Navigation
Service robots should predict human motion for safe and efficient operation.
Anticipating human motion is a key skill for intelligent systems that share a space or interact with humans. Accurate long-term predictions of human movement trajectories, body poses, actions or activities may significantly improve the ability of robots to plan ahead, anticipate the effects of their actions or to foresee hazardous situations. The topic has received increasing attention in recent years across several scientific communities with a growing spectrum of applications in service robots, self-driving cars, collaborative manipulators or tracking and surveillance.
This workshop is the sixth in a series of ICRA 2019-2024 events. The aim of this workshop is to bring together researchers and practitioners from different communities and to discuss recent developments in this field, promising approaches, their limitations, benchmarking techniques and open challenges..
The workshop is on Monday May 13 2024, Room 313-314. Zoom link will be available via the InfoVaya portal managed by ICRA2024.
Service robots should predict human motion for safe and efficient operation.
Working and co-manipulating in close proximity to humans requires precise full-body motion, task and activity anticipation.
Urban and highway navigation is impossible without fast inference on the dynamic enviroment.
This workshop will feature talks of several high-profile invited speakers of diverse academic and industrial backgrounds and a poster session.
Program of the workshop is available.
We encourage researchers submit their novel material in short (up to 4 pages) papers to be presented as posters.
Test the generalization capabilities of motion prediction models in diverse indoor environments.
Recordings of the past LHMP events are available at our YouTube channel.
The workshop is planned for Monday May 13 2024
Following the program.
Time | Speaker | Title | Abstract |
---|---|---|---|
9:00-9:15 (JST) | Organizers | Intro | |
9:15-9:45 (JST) | Sanjiban Choudhury, Cornell University | Isn't Motion Prediction just Model-based RL? | We present a clarifying perspective on motion prediction as an attempt to learn a model of human motion useful for a downstream policy. Drawing from new (and some old) principles of Model-based RL (MBRL), we discuss how to train such models — which loss functions to use, and which data distribution to train on. We present first results on Collaborative Manipulation, where we train transformer models to predict human motion around robots and close the loop to plan with these predictions. |
9:45-10:15 (JST) | Yuxiao Chen, nVidia | How to better predict for planning: categorical behavior prediction for AV planning | In a typical autonomous vehicle (AV) stack, motion predictions are consumed by the planning module to generate safe and efficient motion plans for the AV. With the advent of LLMs and foundational models, categorical/tokenized data plays an increasingly important role. This talk focuses on how we can embrace the change and generate motion predictions that are not only accurate but also friendly to downstream planning, potentially with LLMs/FMs involved. We argue that interpretable behavior-level multimodality is essential to understanding human behavior in traffic, and it is possible to draw a direct connection between trajectory predictions and natural language descriptions. On the other hand, having multiple modes to cover trajectory-level variance not only increases computation cost, but may not be necessary for planning. On this subject, our recent work proposes an easy workaround that enables integration of a gradient-based planner and prediction models. |
10:15-10:30 (JST) | Coffee break | ||
10:30-11:00 (JST) | Tao Chen, Fudan University | The Next Motion Generation: An Observation and Discussion on Motion Generation | This presentation seeks to explore the frontier of human motion generation, offering a comprehensive overview and critical analysis of the latest advancements and methodologies within this field. A significant portion of our discussion is dedicated to "Motion Latent Diffusion," as introduced at CVPR 2023, which illustrates a substantial advancement in efficient motion generation using diffusion models. Additionally, "MotionGPT," presented at NeurIPS 2023, ushers in a revolutionary approach by equating motion generation processes with linguistic analysis, thereby opening new avenues for exploration. The recent progress to "MotionChain," our latest project, which integrates Vision-Language Models with motion generation, aims to establish a holistic framework for autonomous agents. Through this dialogue, we aspire to forge stronger collaborations and innovate beyond the current frontiers of motion generation technologies. |
11:00-11:30 (JST) | Angelo Cangelosi, The University of Manchester | Trust and Theory of Mind in Human Robot interaction | There is growing psychology and social robotics literature showing that theory of mind (ToM) capabilities affect trust in human-robot interaction (HRI). This concerns both users’ ToM and trust of robots, and robot’s artificial ToM and trust of people. We present developmental robotics models and HRI experiments exploring different aspects of these two dimensions of trust, to contribute towards trustworthy and transparent social robots. |
11:30-12:00 (JST) | Arash Ajoudani, Italian Institute of Technology | Predictive and Perspective Control of Human-Robot Interaction through Kino-dynamic State Fusion | Abstract |
12:00-13:30 (JST) | Lunch break | ||
13:30-14:30 (JST) | Poster Session | ||
14:30-15:00 (JST) | Mo Chen, Simon Fraser University | Long-Term Human Motion Prediction Through Hierarchy, Learning, and Control | One of the keys to long-term human motion prediction is hierarchy: Just a few high-level actions can encompass a long duration, while the details of human motion at shorter time scales can be inherently encoded in the high-level actions themselves. In this talk, we will discuss two long-term human trajectory prediction frameworks that take advantage of hierarchy. At the high level, we will look at both action spaces that are hand-designed and those that are learned from data. At the low-level, we will examine how details of human motion at shorter time scales can be reconstructed through a combination of control- and learning-based methods. |
15:00-15:30 (JST) | Alina Roitberg, University of Stuttgart | Towards resource-efficient and uncertainty-aware driver behaviour understanding and maneuver prediction | This talk will explore recent advances in video-based driver observation techniques aimed at creating adaptable, resource- and data-efficient, as well as uncertainty-aware models for in-vehicle monitoring and maneuver prediction. Topics covered will include: (1) an overview state-of-the-art methods and public datasets for driver activity analysis (2) the importance of adaptability in driver observation systems to cater to new situations (environments, vehicle types, driver behaviours) as well as strategies for addressing such open world tasks, and (3) incorporating uncertainty-aware approaches, vital for robust and safe decision-making. The talk will conclude with a discussion of future research directions and the potential applications of this technology, such as improving driver safety and improving the overall driving experience. |
15:30-15:45 (JST) | Coffe break | ||
15:45-16:15 (JST) | Shuhan Tan, The University of Texas at Austin | Leveraging Natural Language for Traffic Simulation in Autonomous Vehicle Development | Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. Natural language allows practitioners to easily articulate interesting and complex traffic scenarios through high-level descriptions. Instead of meticulously crafting the details of each individual scenario, language allows for a seamless conversion of semantic ideas into simulation scenarios at scale. In this talk, I will first introduce our work LCTGen, which takes as input a natural language description of a traffic scenario, and outputs traffic actors’ initial states and motions on a compatible map. Later, I will then introduce our more recent work on building close-loop simulation scenario environments with natural language and traffic models. |
16:15-16:45 (JST) | Tim Schreiter, TUM | THÖR-MAGNI Dataset and Benchmark update | This presentation aims to introduce the THÖR-MAGNI dataset, designed to facilitate research in motion prediction and human-robot interaction. The presentation comprises three parts:
|
16:45-17:00 (JST) | Organizers | Discussion and conclusions |
The following papers are selected to be presented at the workshop. Click on the title to see the pdf.
In case you wish to get more information feel free to reach us via e-mail!