Spatiotemporal Characterization of Gait from Monocular Videos with Transformers

29 Sep 2021 · R. James Cotton, Emoonah McClerklin, Anthony Cimorelli, Ankit Patel ·

Human pose estimation from monocular video is a rapidly advancing field that offers great promise to human movement science and rehabilitation. This potential is tempered by the smaller body of work ensuring the outputs are clinically meaningful and properly calibrated. Gait analysis, typically performed in a dedicated lab, produces precise measurements including kinematics and step timing. Using more than 9000 monocular video from an instrumented gait analysis lab, we evaluated the performance of existing algorithms for measuring kinematics. While they produced plausible results that resemble walking, the joint angles and step length were noisy and poorly calibrated. We trained a transformer to map 3D joint location sequences and the height of individuals onto interpretable biomechanical outputs including joint kinematics and phase within the gait cycle. This task-specific layer greatly reduced errors in the kinematics of the hip, knee and foot, and accurately detected the timing of foot down and up events. We show, for the first time, that accurate spatiotemporal gait parameters including walking speed, step length, cadence, double support time, and single support time can be computed on a cycle-by-cycle basis from these interpretable outputs. Our results indicate lifted 3D joint locations contain enough information for gait analysis, but their representation is not biomechanically accurate enough to use directly, suggesting room for improvement in existing algorithms.

PDF Abstract