Search Results for author: Xuyang Shen

Found 13 papers, 9 papers with code

HGRN2: Gated Linear RNNs with State Expansion

3 code implementations • 11 Apr 2024 • Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Hierarchically gated linear RNN (HGRN, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference.

Image Classification Language Modelling

388

Paper
Code

Linear Attention Sequence Parallelism

1 code implementation • 3 Apr 2024 • Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP method tailored to linear attention-based language models.

Paper
Code

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation • 29 Jan 2024 • Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Paper
Code

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

2 code implementations • 9 Jan 2024 • Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

191

Paper
Code

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations • 27 Jul 2023 • Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

209

Paper
Code

Fine-grained Audible Video Description

1 code implementation • CVPR 2023 • Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong

We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD).

Language Modelling Masked Language Modeling +5

Paper
Code

Audio-Visual Segmentation with Semantics

1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

430

Paper
Code

Linear Video Transformer with Feature Fixation

no code implementations • 15 Oct 2022 • Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

Paper
Add Code

AMF: Adaptable Weighting Fusion with Multiple Fine-tuning for Image Classification

no code implementations • 26 Jul 2022 • Xuyang Shen, Jo Plested, Sabrina Caldwell, Yiran Zhong, Tom Gedeon

Fine-tuning is widely applied in image classification tasks as a transfer learning approach.

Image Classification Transfer Learning

Paper
Add Code

Feature Selection on Thermal-stress Dataset

no code implementations • 8 Sep 2021 • Xuyang Shen, Jo Plested, Tom Gedeon

These findings are likely to improve the accuracy of current stress recognition systems.

feature selection

Paper
Add Code

Exploring Biases and Prejudice of Facial Synthesis via Semantic Latent Space

no code implementations • 23 Aug 2021 • Xuyang Shen, Jo Plested, Sabrina Caldwell, Tom Gedeon

Varying the proportions of male and female faces in the training data can have a substantial effect on behavior on the test data: we found that the seemingly obvious choice of 50:50 proportions was not the best for this dataset to reduce biased behavior on female faces, which was 71% unbiased as compared to our top unbiased rate of 84%.