Search Results for author: Hongkang Li

Found 8 papers, 0 papers with code

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

no code implementations23 Feb 2024 Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.

Binary Classification In-Context Learning

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

no code implementations24 Oct 2023 Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy.

Q-Learning

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

no code implementations26 Aug 2023 Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.

Passage Retrieval Retrieval

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

no code implementations12 Feb 2023 Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen

Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i. e., one self-attention layer followed by a two-layer perceptron, for a classification task.

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

no code implementations7 Jul 2022 Hongkang Li, Shuai Zhang, Meng Wang

In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling

no code implementations7 Jul 2022 Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen, JinJun Xiong

Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data.

Node Classification

Learning One-hidden-layer Neural Networks on Gaussian Mixture Models with Guaranteed Generalizability

no code implementations1 Jan 2021 Hongkang Li, Shuai Zhang, Meng Wang

Instead of following the conventional and restrictive assumption in the literature that the input features follow the standard Gaussian distribution, this paper, for the first time, analyzes a more general and practical scenario that the input features follow a Gaussian mixture model of a finite number of Gaussian distributions of various mean and variance.

Binary Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.