Reason induced visual attention for explainable autonomous driving

11 Oct 2021  ·  Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi ·

Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability. This limitation impedes efficient diagnoses or predictions of system failure, thereby precluding the widespread deployment of DLCV models in safety-critical tasks such as autonomous driving. This study is motivated by the need to enhance the interpretability of DL model in autonomous driving and therefore proposes an explainable DL-based framework that generates textual descriptions of the driving environment and makes appropriate decisions based on the generated descriptions. The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language, while using the language to induce the visual attention in the image. The results indicate strong explainability of autonomous driving decisions obtained by focusing on relevant features from visual inputs. Furthermore, the output attention maps enhance the interpretability of the model not only by providing meaningful explanation to the model behavior but also by identifying the weakness of and potential improvement directions for the model.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods