Multi-modal anticipation of stochastic trajectories in a dynamic environment with Conditional Variational Autoencoders

5 Mar 2021 · Albert Dulian, John C. Murray ·

Forecasting short-term motion of nearby vehicles presents an inherently challenging issue as the space of their possible future movements is not strictly limited to a set of single trajectories. Recently proposed techniques that demonstrate plausible results concentrate primarily on forecasting a fixed number of deterministic predictions, or on classifying over a wide variety of trajectories that were previously generated using e.g. dynamic model. This paper focuses on addressing the uncertainty associated with the discussed task by utilising the stochastic nature of generative models in order to produce a diverse set of plausible paths with regards to tracked vehicles. More specifically, we propose to account for the multi-modality of the problem with use of Conditional Variational Autoencoder (C-VAE) conditioned on an agent's past motion as well as a rasterised scene context encoded with Capsule Network (CapsNet). In addition, we demonstrate advantages of employing the Minimum over N (MoN) cost function which measures the distance between ground truth and N generated samples and tries to minimise the loss with respect to the closest sample, effectively leading to more diverse predictions. We examine our network on a publicly available dataset against recent state-of-the-art methods and show that our approach outperforms these techniques in numerous scenarios whilst significantly reducing the number of trainable parameters as well as allowing to sample an arbitrary amount of diverse trajectories.

PDF Abstract