no code implementations • 15 Dec 2023 • Shengyao Zhang, Mi Zhang, Xudong Pan, Min Yang
To reduce the computation cost and the energy consumption in large language models (LLM), skimming-based acceleration dynamically drops unimportant tokens of the input sequence progressively along layers of the LLM while preserving the tokens of semantic importance.
no code implementations • 29 Jun 2022 • Xudong Pan, Yifan Yan, Shengyao Zhang, Mi Zhang, Min Yang
In this paper, we present a novel insider attack called Matryoshka, which employs an irrelevant scheduled-to-publish DNN model as a carrier model for covert transmission of multiple secret models which memorize the functionality of private ML data stored in local data centers.
1 code implementation • 19 May 2020 • Qi Jia, Mengxue Zhang, Shengyao Zhang, Kenny Q. Zhu
Matching question-answer relations between two turns in conversations is not only the first step in analyzing dialogue structures, but also valuable for training dialogue systems.