1 code implementation • 26 Mar 2024 • Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang
Attempting to complement this deficiency, we investigate layerwise properties of LoRA on fine-tuning tasks and observe an uncommon skewness of weight norms across different layers.
1 code implementation • 28 Feb 2024 • Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang
Additionally, DPA models user preferences as directions (i. e., unit vectors) in the reward space to achieve user-dependent preference control.
1 code implementation • 16 Feb 2024 • Xin Xu, Shizhe Diao, Can Yang, Yang Wang
Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs).
1 code implementation • 6 Feb 2024 • Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang
In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from hallucination.
1 code implementation • 25 Jan 2024 • Quyet V. Do, Tianqing Fang, Shizhe Diao, Zhaowei Wang, Yangqiu Song
When considering a new knowledge instance, ConstraintChecker employs a rule-based module to produce a list of constraints, then it uses a zero-shot learning module to check whether this knowledge instance satisfies all constraints.
1 code implementation • 16 Nov 2023 • Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang
This approach is formalized by first identifying the knowledge gap between parametric knowledge and the instruction tuning data.
1 code implementation • 14 Nov 2023 • Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang
Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models.
1 code implementation • 20 Oct 2023 • Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung
Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant.
1 code implementation • 15 Oct 2023 • Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, Roger Zimmermann
To address these issues, we propose UniTime for effective cross-domain time series learning.
no code implementations • 12 Sep 2023 • Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang
Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different reward-tax trade-offs, we propose Adaptive Model Averaging (AMA) to adaptively find various combination ratios of model layers.
1 code implementation • 21 Jun 2023 • Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang
As the number of available models and specialized tasks keeps growing, the job of general finetuning becomes highly nontrivial.
1 code implementation • 8 Jun 2023 • Shizhe Diao, Tianyang Xu, Ruijia Xu, Jiawei Wang, Tong Zhang
Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain.
1 code implementation • 6 Jun 2023 • Zhihong Chen, Guiming Hardy Chen, Shizhe Diao, Xiang Wan, Benyou Wang
Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e. g., BERT, one of the representative models.
1 code implementation • 23 May 2023 • Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang
Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.
1 code implementation • 13 Apr 2023 • Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang
Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.
2 code implementations • 24 Feb 2023 • Kashun Shum, Shizhe Diao, Tong Zhang
However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains.
2 code implementations • 23 Feb 2023 • Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang
For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries.
1 code implementation • 20 Feb 2023 • Shizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan Song, Tong Zhang
Social media classification tasks (e. g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous.
1 code implementation • ICCV 2023 • Zhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan
Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts.
1 code implementation • 30 Nov 2022 • Rui Pan, Shizhe Diao, Jianlin Chen, Tong Zhang
In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining.
1 code implementation • 21 Nov 2022 • Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang
The resulting method is significantly more powerful than the standard normalization flow approach for generating data distributions with multiple modes.
1 code implementation • 15 Jun 2022 • Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang
In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.
1 code implementation • 30 May 2022 • Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang
We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.
Vietnamese Language Models Vietnamese Natural Language Understanding +1
1 code implementation • 21 Jan 2022 • Shizhe Diao, Zhichao Huang, Ruijia Xu, Xuechun Li, Yong Lin, Xiao Zhou, Tong Zhang
Particularly, instead of fine-tuning the model in the cloud, we adapt PLMs by prompt learning, which efficiently optimizes only a few parameters of the discrete prompts.
1 code implementation • NeurIPS 2021 • Xiao Zhou, Weizhong Zhang, Zonghao Chen, Shizhe Diao, Tong Zhang
For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training.
1 code implementation • ACL 2021 • Shizhe Diao, Ruijia Xu, Hongjin Su, Yilei Jiang, Yan Song, Tong Zhang
In this paper, we aim to adapt a generic pretrained model with a relatively small amount of domain-specific data.
1 code implementation • 21 Apr 2020 • Shizhe Diao, Yan Song, Tong Zhang
Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document.
7 code implementations • Findings of the Association for Computational Linguistics 2020 • Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang
Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.
Ranked #1 on Chinese Part-of-Speech Tagging on CTB5 Dev
Chinese Named Entity Recognition Chinese Word Segmentation +5