[Re] On end-to-end 6{DoF} object pose estimation and robustness to object scale

RC 2020 · Georgios Nikolaos Albanis, Nikolaos Zioulis, Anargyros Chatzitofis, Anastasios Dimou, Dimitrios Zarpalas, Petros Daras ·

Reproduction study for End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization

Scope of Reproducibility
This report contains a set of experiments that seek to reproduce the claims of two recent works related to keypoint estimation, one specific to 6DoF object pose estimation, and the other presenting a generic architectural improvement for keypoint estimation but demonstrated in human pose estimation. More specifically, in the backpropagatable PnP [1], the authors claim that incorporating geometric optimization in a deep-learning pipeline and predicting an objectʼs pose in an end-to-end manner yields improved performance. On the other hand, HigherHRNet [2] introduces a novel heatmap aggregation method that allows for scale-aware pose estimations, offering higher keypoint localization accuracy for small scale objects.

Methodology
We used the publicly provided code where available, adapting it to fit into a model development kit to facilitate our experiments. We used a dataset fit for validating both claims simultaneously, and designed a set of experiments based on the published methodologies, but also went beyond seeking to validate the higher level concepts. Our experiments were conducted on a Nvidia 2080 12 GB GPU with an average training time of 14 hours.

Results
We reproduce the claims of both papers by conducting several experiments in the UAVA dataset [3]. The integration of a differentiable geometric module within an keypointbased object pose estimation model improved its performance in metrics. We additionally verify that this is the case for other differentiable PnP implementations (i.e. EPnP). Further, our results indicate that indeed HigherHRNet improves keypoint localisation performance on small scale objects.

What was easy
Both papers provided publicly available implementations. In addition, many different variations were also found online. Finally, the papers themselves were very clearly written, offering insights on various important details.

What was difficult
The main issue that required more effort was identifying the appropriate weights for BPnP [1] in order to balance the different optimization objectives. As expected, this varies for the context that it is applied (task, dataset) and the values presented in the paper did not work in our case. Sub-optimal selection of weights leads to convergence issues.

Communication with original authors
We communicated with the authors of [1] through GitHub, and we would like to thank them as they provided a fast and detailed response. Furthermore, their responsiveness to past issues had already provided a nice knowledge base regarding reproduction.