Search Results for author: Shicheng Tan

Found 4 papers, 4 papers with code

Why are hyperbolic neural networks effective? A study on hierarchical representation capability

1 code implementation4 Feb 2024 Shicheng Tan, Huanjing Zhao, Shu Zhao, Yanping Zhang

Inspired by the analysis results, we propose several pre-training strategies to enhance HRC and improve the performance of downstream tasks, further validating the reliability of the analysis.

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

1 code implementation11 Jun 2023 Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang

Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices.

General Knowledge Knowledge Distillation +1

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

1 code implementation11 Jun 2023 Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang

To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance.

Knowledge Distillation Language Modelling

Coherence-Based Distributed Document Representation Learning for Scientific Documents

1 code implementation8 Jan 2022 Shicheng Tan, Shu Zhao, Yanping Zhang

In this paper, we propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents, which maintains the coherence of the document with coupled text pairs formed by segmenting the document.

Information Retrieval Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.