Search Results for author: Yuchuan Tian

Found 5 papers, 4 papers with code

DiJiang: Efficient Large Language Models through Compact Kernelization

1 code implementation • 29 Mar 2024 • Hanting Chen, Zhicheng Liu, Xutao Wang, Yuchuan Tian, Yunhe Wang

In an effort to reduce the computational load of Transformers, research on linear attention has gained significant momentum.

Paper
Code

Rethinking Optimization and Architecture for Tiny Language Models

1 code implementation • 5 Feb 2024 • Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang

Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training.

Language Modelling

Paper
Code

Towards Higher Ranks via Adversarial Weight Pruning

1 code implementation • NeurIPS 2023 • Yuchuan Tian, Hanting Chen, Tianyu Guo, Chao Xu, Yunhe Wang

To this end, we propose a Rank-based PruninG (RPG) method to maintain the ranks of sparse weights in an adversarial manner.

Model Compression Network Pruning

1,116

Paper
Code

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

3 code implementations • 29 May 2023 • Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, Qinghua Zhang, Ruifeng Li, Chao Xu, Yunhe Wang

Recent releases of Large Language Models (LLMs), e. g. ChatGPT, are astonishing at generating human-like texts, but they may impact the authenticity of texts.

Language Modelling text-classification +2

1,116

Paper
Code

Source-Target Unified Knowledge Distillation for Memory-Efficient Federated Domain Adaptation on Edge Devices

no code implementations • 29 Sep 2021 • Xiaochen Zhou, Yuchuan Tian, Xudong Wang

Moreover, to prevent the compact model from forgetting the knowledge of the source data during knowledge distillation, a collaborative knowledge distillation (Co-KD) method is developed to unify the source data on the server and the target data on the edge device to train the compact model.

Domain Adaptation Knowledge Distillation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.