Convolutional Sequence Generation for Skeleton-Based Action Synthesis

In this work, we aim to generate long actions represented as sequences of skeletons. The generated sequences must demonstrate continuous, meaningful human actions, while maintaining coherence among body parts. Instead of generating skeletons sequentially following an autoregressive model, we propose a framework that generates the entire sequence altogether by ransforming from a sequence of latent vectors sampled from a Gaussian process (GP). This framework, named Convolutional Sequence Generation Network (CSGN), jointly models structures in temporal and spatial dimensions. It captures the temporal structure at multiple scales through the GP prior and the temporal convolutions; and establishes the spatial connection between the latent vectors and the skeleton graphs via a novel graph refining scheme. It is noteworthy that CSGN allows bidirectional transforms between the latent and the observed spaces, thus enabling semantic manipulation of the action sequences in various forms. We conducted empirical studies on multiple datasets, including a set of high-quality dancing sequences collected by us. The results show that our framework can produce long action sequences that are coherent across time steps and among body parts.

PDF Abstract ICCV 2019 2019 PDF ICCV 2019 2019 Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Human action generation NTU RGB+D CSGN FID (CS) 6.030 # 2
FID (CV) 7.114 # 2

Methods