Long-term Human Motion Prediction Workshop - ICRA 2024

Anticipating human motion is a key skill for intelligent systems that share a space or interact with humans. Accurate long-term predictions of human movement trajectories, body poses, actions or activities may significantly improve the ability of robots to plan ahead, anticipate the effects of their actions or to foresee hazardous situations. The topic has received increasing attention in recent years across several scientific communities with a growing spectrum of applications in service robots, self-driving cars, collaborative manipulators or tracking and surveillance.

This workshop is the sixth in a series of ICRA 2019-2024 events. The aim of this workshop is to bring together researchers and practitioners from different communities and to discuss recent developments in this field, promising approaches, their limitations, benchmarking techniques and open challenges..

The workshop is on Monday May 13 2024, Room 313-314. Zoom link will be available via the InfoVaya portal managed by ICRA2024.

Social and Predictive Navigation

Service robots should predict human motion for safe and efficient operation.

Collaborative and production robots

Working and co-manipulating in close proximity to humans requires precise full-body motion, task and activity anticipation.

Automated driving

Urban and highway navigation is impossible without fast inference on the dynamic enviroment.


This workshop will feature talks of several high-profile invited speakers of diverse academic and industrial backgrounds and a poster session.


Program of the workshop is available.

Call for Papers

We encourage researchers submit their novel material in short (up to 4 pages) papers to be presented as posters.


Test the generalization capabilities of motion prediction models in diverse indoor environments.

YouTube channel

Recordings of the past LHMP events are available at our YouTube channel.


The workshop is planned for Monday May 13 2024

Following the program.

Time Speaker Title Abstract
9:00-9:15 (JST) Organizers Intro
9:15-9:45 (JST) Sanjiban Choudhury, Cornell University Isn't Motion Prediction just Model-based RL? We present a clarifying perspective on motion prediction as an attempt to learn a model of human motion useful for a downstream policy. Drawing from new (and some old) principles of Model-based RL (MBRL), we discuss how to train such models — which loss functions to use, and which data distribution to train on. We present first results on Collaborative Manipulation, where we train transformer models to predict human motion around robots and close the loop to plan with these predictions.
9:45-10:15 (JST) Yuxiao Chen, nVidia How to better predict for planning: categorical behavior prediction for AV planning In a typical autonomous vehicle (AV) stack, motion predictions are consumed by the planning module to generate safe and efficient motion plans for the AV. With the advent of LLMs and foundational models, categorical/tokenized data plays an increasingly important role. This talk focuses on how we can embrace the change and generate motion predictions that are not only accurate but also friendly to downstream planning, potentially with LLMs/FMs involved. We argue that interpretable behavior-level multimodality is essential to understanding human behavior in traffic, and it is possible to draw a direct connection between trajectory predictions and natural language descriptions. On the other hand, having multiple modes to cover trajectory-level variance not only increases computation cost, but may not be necessary for planning. On this subject, our recent work proposes an easy workaround that enables integration of a gradient-based planner and prediction models.
10:15-10:30 (JST) Coffee break
10:30-11:00 (JST) Tao Chen, Fudan University The Next Motion Generation: An Observation and Discussion on Motion Generation This presentation seeks to explore the frontier of human motion generation, offering a comprehensive overview and critical analysis of the latest advancements and methodologies within this field. A significant portion of our discussion is dedicated to "Motion Latent Diffusion," as introduced at CVPR 2023, which illustrates a substantial advancement in efficient motion generation using diffusion models. Additionally, "MotionGPT," presented at NeurIPS 2023, ushers in a revolutionary approach by equating motion generation processes with linguistic analysis, thereby opening new avenues for exploration. The recent progress to "MotionChain," our latest project, which integrates Vision-Language Models with motion generation, aims to establish a holistic framework for autonomous agents. Through this dialogue, we aspire to forge stronger collaborations and innovate beyond the current frontiers of motion generation technologies.
11:00-11:30 (JST) Angelo Cangelosi, The University of Manchester Trust and Theory of Mind in Human Robot interaction There is growing psychology and social robotics literature showing that theory of mind (ToM) capabilities affect trust in human-robot interaction (HRI). This concerns both users’ ToM and trust of robots, and robot’s artificial ToM and trust of people. We present developmental robotics models and HRI experiments exploring different aspects of these two dimensions of trust, to contribute towards trustworthy and transparent social robots.
11:30-12:00 (JST) Arash Ajoudani, Italian Institute of Technology Predictive and Perspective Control of Human-Robot Interaction through Kino-dynamic State Fusion Abstract
12:00-13:30 (JST) Lunch break
13:30-14:30 (JST) Poster Session
14:30-15:00 (JST) Mo Chen, Simon Fraser University Long-Term Human Motion Prediction Through Hierarchy, Learning, and Control One of the keys to long-term human motion prediction is hierarchy: Just a few high-level actions can encompass a long duration, while the details of human motion at shorter time scales can be inherently encoded in the high-level actions themselves. In this talk, we will discuss two long-term human trajectory prediction frameworks that take advantage of hierarchy. At the high level, we will look at both action spaces that are hand-designed and those that are learned from data. At the low-level, we will examine how details of human motion at shorter time scales can be reconstructed through a combination of control- and learning-based methods.
15:00-15:30 (JST) Alina Roitberg, University of Stuttgart Towards resource-efficient and uncertainty-aware driver behaviour understanding and maneuver prediction This talk will explore recent advances in video-based driver observation techniques aimed at creating adaptable, resource- and data-efficient, as well as uncertainty-aware models for in-vehicle monitoring and maneuver prediction. Topics covered will include: (1) an overview state-of-the-art methods and public datasets for driver activity analysis (2) the importance of adaptability in driver observation systems to cater to new situations (environments, vehicle types, driver behaviours) as well as strategies for addressing such open world tasks, and (3) incorporating uncertainty-aware approaches, vital for robust and safe decision-making. The talk will conclude with a discussion of future research directions and the potential applications of this technology, such as improving driver safety and improving the overall driving experience.
15:30-15:45 (JST) Coffe break
15:45-16:15 (JST) Shuhan Tan, The University of Texas at Austin Leveraging Natural Language for Traffic Simulation in Autonomous Vehicle Development Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. Natural language allows practitioners to easily articulate interesting and complex traffic scenarios through high-level descriptions. Instead of meticulously crafting the details of each individual scenario, language allows for a seamless conversion of semantic ideas into simulation scenarios at scale. In this talk, I will first introduce our work LCTGen, which takes as input a natural language description of a traffic scenario, and outputs traffic actors’ initial states and motions on a compatible map. Later, I will then introduce our more recent work on building close-loop simulation scenario environments with natural language and traffic models.
16:15-16:45 (JST) Tim Schreiter, TUM THÖR-MAGNI Dataset and Benchmark update This presentation aims to introduce the THÖR-MAGNI dataset, designed to facilitate research in motion prediction and human-robot interaction. The presentation comprises three parts:
  1. A Detailed Introduction to THÖR-MAGNI. This part provides an overview of the dataset. With its comprehensive collection of contextual cues and multifaceted data, THÖR-MAGNI is an excellent resource for predicting human motion in shared environments.
  2. Tools for Algorithm Development. We will showcase a suite of tools and a GitHub repository designed to ease model training on THÖR-MAGNI and enhance accessibility to data processing.
  3. Benchmarking Challenge Demonstration. This part will demonstrate how participants can submit their models and engage with the benchmarking challenge.
16:45-17:00 (JST) Organizers Discussion and conclusions


The following papers are selected to be presented at the workshop. Click on the title to see the pdf.

Paper ID Authors Title
1 Chenhao Li FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning
2 Francesco Verdoja, Tomasz Piotr Kucner, Ville Kyrki Using occupancy priors to generalize people flow predictions
3 Justin Lidard, Hang Pham, Ariel Bachman, Bryan Boateng, Anirudha Majumdar Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction
4 Kushal Kedia, Atiksh Bhardwaj, Prithwish Dan, Sanjiban Choudhury InteRACT: Transformer Models for Human Intent Prediction Conditioned on Robot Actions
5 Till Hielscher, Lukas Heuer, Frederik Wulle, Luigi Palmieri Towards Using Fast Embedded Model Predictive Control for Human-Aware Predictive Robot Navigation
6 Andrei Ivanovic, Masha Itkina, Rowan McAllister, Igor Gilitschenski, Florian Shkurti On the Importance of Uncertainty Calibration in Perception-Based Motion Planning
7 Kazuki Mizuta, Karen Leung CoBL-Diffusion: Diffusion-Based Conditional Robot Planning in Dynamic Environments Using Control Barrier and Lyapunov Functions
8 Claire Liang, Valerie Chen Persistent Homology for Capturing Social Structure and Cohesion of F-Formation Groups
9 Ronny Hug, Stefan Becker, Wolfgang Hübner, Michael Arens Generating Synthetic Ground Truth Distributions for Multi-step Trajectory Prediction using Probabilistic Composite Bézier Curves
10 Ali Imran, Giovanni Beltrame, David St-Onge Decentralized Multi-Robot Shared Perception for Worker Action Inference in Industrial Facilities
11 Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello Towards Human Awareness in Robot Task Planning with Large Language Models

In collaboration with

Supported by

Get in touch

In case you wish to get more information feel free to reach us via e-mail!