no code implementations • 27 Nov 2023 • Yifei Chen, Dapeng Chen, Ruijin Liu, Sai Zhou, Wenyuan Xue, Wei Peng
With the aligned entities, we feed their text embeddings to a transformer-based video adapter as the queries, which can help extract the semantics of the most important entities from a video to a vector.
no code implementations • 29 Aug 2023 • Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian yuan, Wei Peng
We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB).
no code implementations • ICCV 2023 • Yifei Chen, Dapeng Chen, Ruijin Liu, Hao Li, Wei Peng
Supervised by the semantics of action labels, recent works adapt the visual branch of VLMs to learn video representations.
1 code implementation • 31 Dec 2021 • Ruijin Liu, Dapeng Chen, Tie Liu, Zhiliang Xiong, Zejian yuan
In this task, the correct camera pose is the key to generating accurate lanes, which can transform an image from perspective-view to the top-view.
Ranked #6 on 3D Lane Detection on Apollo Synthetic 3D Lane
2 code implementations • 9 Nov 2020 • Ruijin Liu, Zejian yuan, Tie Liu, Zhiliang Xiong
To tackle these issues, we propose an end-to-end method that directly outputs parameters of a lane shape model, using a network built with a transformer to learn richer structures and context.
Ranked #20 on Lane Detection on TuSimple