no code implementations • 20 Apr 2024 • Yuang Liu, Zhiheng Qiu, Xiaokai Qin
ViT divides an image into several local patches, known as "visual sentences".
no code implementations • 19 Dec 2023 • Yuang Liu, Jing Wang, Qiang Zhou, Fan Wang, Jun Wang, Wei zhang
Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, have been proposed to acquire powerful and general representations from unlabeled data.
no code implementations • 23 Nov 2023 • Jing Wang, Yuang Liu, Qiang Zhou, Fan Wang
Few-shot learning is a promising way for reducing the label cost in new categories adaptation with the guidance of a small, well labeled support set.
no code implementations • 4 Aug 2023 • Qiang Zhou, Chaohui Yu, Jingliang Li, Yuang Liu, Jing Wang, Zhibin Wang
to provide additional consistency constraints, which grows GPU memory consumption and complicates the model's structure and training pipeline.
no code implementations • 3 Aug 2023 • Yuang Liu, Qiang Zhou, Jing Wang, Fan Wang, Jun Wang, Wei zhang
Vision transformers (ViT) usually extract features via forwarding all the tokens in the self-attention layers from top to toe.
no code implementations • 27 Feb 2023 • Qiang Zhou, Yuang Liu, Chaohui Yu, Jingliang Li, Zhibin Wang, Fan Wang
Instead of relabeling each dataset with the unified taxonomy, a category-guided decoding module is designed to dynamically guide predictions to each datasets taxonomy.
1 code implementation • 19 May 2022 • Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, dianhai yu
We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning.
Cross-Lingual Natural Language Inference Distributed Computing +2
no code implementations • 31 Dec 2021 • Yuang Liu, Wei zhang, Jun Wang, Jianyong Wang
In this paper, we provide a comprehensive survey on data-free knowledge transfer from the perspectives of knowledge distillation and unsupervised domain adaptation, to help readers have a better understanding of the current research status and ideas.
no code implementations • 21 Dec 2021 • Jun Chen, Yuang Liu, Xiangrui Zhao, Mengmeng Wang, Yong liu
As a result, we prove that, if initial metrics have an $L^2$-norm perturbation which deviates from the Hyperbolic metric on the Poincar\'e ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric.
no code implementations • 29 Sep 2021 • Jun Chen, Hanwen Chen, Jiangning Zhang, Yuang Liu, Tianxin Huang, Yong liu
Quantized Neural Networks (QNNs) aim at replacing full-precision weights $\boldsymbol{W}$ with quantized weights $\boldsymbol{\hat{W}}$, which make it possible to deploy large models to mobile and miniaturized devices easily.
no code implementations • CVPR 2021 • Yuang Liu, Wei zhang, Jun Wang
To cope with this issue, we propose a source-free domain adaptation framework for semantic segmentation, namely SFDA, in which only a well-trained source model and an unlabeled target domain dataset are available for adaptation.
no code implementations • CVPR 2021 • Yuang Liu, Wei zhang, Jun Wang
To address the above issues, we propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer from a full-precision model to its quantized model.
Ranked #2 on Data Free Quantization on CIFAR-100 (CIFAR-100 W5A5 Top-1 Accuracy metric)
1 code implementation • 6 Mar 2021 • Yuang Liu, Wei zhang, Jun Wang
Knowledge distillation~(KD) is an effective learning paradigm for improving the performance of lightweight student networks by utilizing additional supervision knowledge distilled from teacher networks.
no code implementations • 19 May 2020 • Yuang Liu, Wei zhang, Jun Wang
Knowledge Distillation (KD) is an effective framework for compressing deep learning models, realized by a student-teacher paradigm requiring small student networks to mimic the soft target generated by well-trained teachers.