Long-term Human Motion Prediction Workshop - ICRA 2024

Anticipating human motion is a key skill for intelligent systems that share a space or interact with humans. Accurate long-term predictions of human movement trajectories, body poses, actions or activities may significantly improve the ability of robots to plan ahead, anticipate the effects of their actions or to foresee hazardous situations. The topic has received increasing attention in recent years across several scientific communities with a growing spectrum of applications in service robots, self-driving cars, collaborative manipulators or tracking and surveillance.

This workshop is the sixth in a series of ICRA 2019-2024 events. The aim of this workshop is to bring together researchers and practitioners from different communities and to discuss recent developments in this field, promising approaches, their limitations, benchmarking techniques and open challenges..

The workshop is planned for Monday May 13 2024.

Social and Predictive Navigation

Service robots should predict human motion for safe and efficient operation.

Collaborative and production robots

Working and co-manipulating in close proximity to humans requires precise full-body motion, task and activity anticipation.

Automated driving

Urban and highway navigation is impossible without fast inference on the dynamic enviroment.


This workshop will feature talks of several high-profile invited speakers of diverse academic and industrial backgrounds and a poster session.


Preliminary program of the full-day workshop is available.

Call for Papers

We encourage researchers submit their novel material in short (up to 4 pages) papers to be presented as posters.


Test the generalization capabilities of motion prediction models in diverse indoor environments.

YouTube channel

Recordings of the past LHMP events are available at our YouTube channel.

Program (tentative)

The workshop is planned for Monday May 13 2024

Following a tentative program.

Time Speaker Title Abstract
9:00-9:15 (JST) Organizers Intro
9:15-9:45 (JST) Sanjiban Choudhury, Cornell University Title Abstract
9:45-10:15 (JST) Marco Pavone, Stanford University, nVidia Revolutionizing AV Development With Foundation Models Foundation models, trained on vast and diverse data encompassing the human experience, are at the heart of the ongoing AI revolution influencing the way we create, problem solve, and work. These models, and the lessons learned from their construction, can also be applied to the way we develop a similarly transformative technology, autonomous vehicles. In this talk I’ll highlight recent research efforts toward rethinking elements of an AV program both in the vehicle and in the data center, with an emphasis on (1) leveraging diverse data sources for long-tail safety evaluation, (2) composing ingredients for universal and controllable end-to-end simulation, and (3) building the self-accelerating data flywheels that will enable scaling AV learning to new frontiers of autonomous reasoning and generalization.
10:15-10:45 (JST) Tao Chen, Fudan University The Next Motion Generation: An Observation and Discussion on Motion Generation This presentation seeks to explore the frontier of human motion generation, offering a comprehensive overview and critical analysis of the latest advancements and methodologies within this field. A significant portion of our discussion is dedicated to "Motion Latent Diffusion," as introduced at CVPR 2023, which illustrates a substantial advancement in efficient motion generation using diffusion models. Additionally, "MotionGPT," presented at NeurIPS 2023, ushers in a revolutionary approach by equating motion generation processes with linguistic analysis, thereby opening new avenues for exploration. The recent progress to "MotionChain," our latest project, which integrates Vision-Language Models with motion generation, aims to establish a holistic framework for autonomous agents. Through this dialogue, we aspire to forge stronger collaborations and innovate beyond the current frontiers of motion generation technologies.
10:45-11:15 (JST) Coffee break
11:15-11:45 (JST) Angelo Cangelosi, The University of Manchester Title Abstract
11:45-12:15 (JST) Arash Ajoudani, Italian Institute of Technology Predictive and Perspective Control of Human-Robot Interaction through Kino-dynamic State Fusion Abstract
12:15-13:30 (JST) Lunch break
13:30-14:30 (JST) Poster Session
14:30-15:00 (JST) Mo Chen, Simon Fraser University Long-Term Human Motion Prediction Through Hierarchy, Learning, and Control One of the keys to long-term human motion prediction is hierarchy: Just a few high-level actions can encompass a long duration, while the details of human motion at shorter time scales can be inherently encoded in the high-level actions themselves. In this talk, we will discuss two long-term human trajectory prediction frameworks that take advantage of hierarchy. At the high level, we will look at both action spaces that are hand-designed and those that are learned from data. At the low-level, we will examine how details of human motion at shorter time scales can be reconstructed through a combination of control- and learning-based methods.
15:00-15:30 (JST) Alina Roitberg, University of Stuttgart Towards resource-efficient and uncertainty-aware driver behaviour understanding and maneuver prediction This talk will explore recent advances in video-based driver observation techniques aimed at creating adaptable, resource- and data-efficient, as well as uncertainty-aware models for in-vehicle monitoring and maneuver prediction. Topics covered will include: (1) an overview state-of-the-art methods and public datasets for driver activity analysis (2) the importance of adaptability in driver observation systems to cater to new situations (environments, vehicle types, driver behaviours) as well as strategies for addressing such open world tasks, and (3) incorporating uncertainty-aware approaches, vital for robust and safe decision-making. The talk will conclude with a discussion of future research directions and the potential applications of this technology, such as improving driver safety and improving the overall driving experience.
15:30-15:45 (JST) Coffe break
15:45-16:15 (JST) Shuhan Tan, The University of Texas at Austin Leveraging Natural Language for Traffic Simulation in Autonomous Vehicle Development Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. Natural language allows practitioners to easily articulate interesting and complex traffic scenarios through high-level descriptions. Instead of meticulously crafting the details of each individual scenario, language allows for a seamless conversion of semantic ideas into simulation scenarios at scale. In this talk, I will first introduce our work LCTGen, which takes as input a natural language description of a traffic scenario, and outputs traffic actors’ initial states and motions on a compatible map. Later, I will then introduce our more recent work on building close-loop simulation scenario environments with natural language and traffic models.
16:15-16:45 (JST) Sunil K. Agrawal, Columbia University Title Abstract
16:45-17:00 (JST) Organizers Discussion and conclusions

In collaboration with

Supported by

Get in touch

In case you wish to get more information feel free to reach us via e-mail!