no code implementations • 21 Jun 2020 • Shi-Yang Yan, Yang Hua, Neil M. Robertson
We tackle this problem by proposing an off-policy RL learning algorithm where a behaviour policy represented by GRUs performs the sampling.
no code implementations • 21 Apr 2020 • Shi-Yang Yan, Yang Hua, Neil Robertson
Furthermore, to enable the ParaCNN to model paragraph comprehensively, we also propose an adversarial twin net training scheme.
no code implementations • 14 Aug 2019 • Shi-Yang Yan, Jun Xu, Yuai Liu, Lin Xu
Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID.
no code implementations • 13 Nov 2018 • Shi-Yang Yan, Yuan Xie, Fang-Yu Wu, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
Automatically generating the descriptions of an image, i. e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing.
no code implementations • 25 Aug 2017 • Shi-Yang Yan, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
Through visualization of what have been learnt by the networks, it can be observed that both the attention regions of images and the hierarchical temporal structure can be captured by HM-AN.
no code implementations • 24 Jul 2017 • Fang-Yu Wu, Shi-Yang Yan, Jeremy S. Smith, Bai-Ling Zhang
In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme.
no code implementations • 9 May 2017 • Shi-Yang Yan, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
This paper presents improvements to the soft attention model by combining a convolutional LSTM with a hierarchical system architecture to recognize action categories in videos.