no code implementations • 15 Jun 2021 • Boitumelo Ruf, Jonas Mohrs, Martin Weinmann, Stefan Hinz, Jürgen Beyerer
In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.