no code implementations • 1 Mar 2024 • Jiandong Jin, Bowen Tang, Mingxuan Ma, Xiao Liu, Yunfei Wang, Qingnan Lai, Jia Yang, Changling Zhou
We introduces Crimson, a system that enhances the strategic reasoning capabilities of Large Language Models (LLMs) within the realm of cybersecurity.
2 code implementations • 17 Dec 2023 • Xiao Wang, Jiandong Jin, Chenglong Li, Jin Tang, Cheng Zhang, Wei Wang
In this paper, we formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels.
2 code implementations • 4 Dec 2023 • Jiandong Jin, Xiao Wang, Chenglong Li, Lili Huang, Jin Tang
Then, a Transformer decoder is proposed to generate the human attributes by incorporating the visual features and attribute query tokens.
1 code implementation • 30 Nov 2023 • Dong Li, Jiandong Jin, Yuhao Zhang, Yanlin Zhong, Yaoyang Wu, Lan Chen, Xiao Wang, Bin Luo
Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition.
1 code implementation • 20 Apr 2023 • Jun Zhu, Jiandong Jin, Zihan Yang, Xiaohao Wu, Xiao Wang
The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning.