In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters.
#2 best model for Atari Games on Atari 2600 Pong
The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame.
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
#3 best model for Action Recognition In Videos on Sports-1M
Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise.
Moving forward, we will work on unlocking stage-2 optimizations, with up to 8x memory savings per device, and ultimately stage-3 optimizations, reducing memory linearly with respect to the number of devices and potentially scaling to models of arbitrary size.