Search Results for author: Tongzheng Ren

Found 33 papers, 9 papers with code

Accountable Off-Policy Evaluation via a Kernelized Bellman Statistics

no code implementations ICML 2020 Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

In this work, we investigate the statistical properties of the kernel loss, which allows us to find a feasible set that contains the true value function with high probability.

Off-policy evaluation

DeepSeek-VL: Towards Real-World Vision-Language Understanding

2 code implementations8 Mar 2024 Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

Chatbot Language Modelling +3

Efficient Reinforcement Learning from Partial Observability

no code implementations20 Nov 2023 Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.

Partially Observable Reinforcement Learning reinforcement-learning

Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

no code implementations8 Apr 2023 Tongzheng Ren, Zhaolin Ren, Haitong Ma, Na Li, Bo Dai

This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems.

Markovian Sliced Wasserstein Distances: Beyond Independent Projections

1 code implementation NeurIPS 2023 Khai Nguyen, Tongzheng Ren, Nhat Ho

Sliced Wasserstein (SW) distance suffers from redundant projections due to independent uniform random projecting directions.

Hierarchical Sliced Wasserstein Distance

1 code implementation27 Sep 2022 Khai Nguyen, Tongzheng Ren, Huy Nguyen, Litu Rout, Tan Nguyen, Nhat Ho

We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively.

Making Linear MDPs Practical via Contrastive Representation Learning

no code implementations14 Jul 2022 Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.

Representation Learning

Efficient Forecasting of Large Scale Hierarchical Time Series via Multilevel Clustering

no code implementations27 May 2022 Xing Han, Tongzheng Ren, Jing Hu, Joydeep Ghosh, Nhat Ho

To attain this goal, each time series is first assigned the forecast for its cluster representative, which can be considered as a "shrinkage prior" for the set of time series it represents.

Clustering Time Series +1

Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

no code implementations23 May 2022 Tongzheng Ren, Fuheng Cui, Sujay Sanghavi, Nhat Ho

However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice.

Open-Ended Question Answering

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

no code implementations16 May 2022 Nhat Ho, Tongzheng Ren, Sujay Sanghavi, Purnamrita Sarkar, Rachel Ward

Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings.

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

no code implementations13 Mar 2022 Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties as we need to simultaneously estimate the training environment uncertainty from samples and find the worst-case perturbation for testing.

Improving Computational Complexity in Statistical Models with Second-Order Information

no code implementations9 Feb 2022 Tongzheng Ren, Jiacheng Zhuo, Sujay Sanghavi, Nhat Ho

This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius.

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

no code implementations22 Nov 2021 Tongzheng Ren, Tianjun Zhang, Csaba Szepesvári, Bo Dai

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality.

Reinforcement Learning (RL) Representation Learning

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

no code implementations15 Oct 2021 Tongzheng Ren, Fuheng Cui, Alexia Atsidakou, Sujay Sanghavi, Nhat Ho

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions.

MaxUp: Lightweight Adversarial Training With Data Augmentation Improves Neural Network Training

no code implementations CVPR 2021 Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu

The idea is to generate a set of augmented data with some random perturbations or transforms, and minimize the maximum, or worst case loss over the augmented data.

Data Augmentation Image Classification +1

Quasi-Bayesian Dual Instrumental Variable Regression

1 code implementation NeurIPS 2021 Ziyu Wang, Yuhao Zhou, Tongzheng Ren, Jun Zhu

Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking.

Bayesian Inference regression +1

Unsupervised Out-of-Domain Detection via Pre-trained Transformers

1 code implementation ACL 2021 Keyang Xu, Tongzheng Ren, Shikun Zhang, Yihao Feng, Caiming Xiong

Deployed real-world machine learning applications are often subject to uncontrolled and even potentially malicious inputs.

Scalable Quasi-Bayesian Inference for Instrumental Variable Regression

no code implementations NeurIPS 2021 Ziyu Wang, Yuhao Zhou, Tongzheng Ren, Jun Zhu

Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking.

Bayesian Inference regression +1

Nearly Horizon-Free Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.

reinforcement-learning Reinforcement Learning (RL)

Linear Bandit Algorithms with Sublinear Time Complexity

no code implementations3 Mar 2021 Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\tilde O(\sqrt{T})$ regret.

Movie Recommendation

Combinatorial Bandits without Total Order for Arms

no code implementations3 Mar 2021 Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

Specifically, we focus on a challenging setting where 1) the reward distribution of an arm depends on the set $s$ it is part of, and crucially 2) there is \textit{no total order} for the arms in $\mathcal{A}$.

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

no code implementations15 Aug 2020 Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy.

Medical Diagnosis Off-policy evaluation +1

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information

no code implementations ICLR 2020 Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.

counterfactual

Stein Self-Repulsive Dynamics: Benefits From Past Samples

1 code implementation NeurIPS 2020 Mao Ye, Tongzheng Ren, Qiang Liu

Our idea is to introduce Stein variational gradient as a repulsive force to push the samples of Langevin dynamics away from the past trajectories.

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

1 code implementation20 Feb 2020 Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu

The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data.

Few-Shot Image Classification General Classification +1

Function Space Particle Optimization for Bayesian Neural Networks

1 code implementation ICLR 2019 Ziyu Wang, Tongzheng Ren, Jun Zhu, Bo Zhang

While Bayesian neural networks (BNNs) have drawn increasing attention, their posterior inference remains challenging, due to the high-dimensional and over-parameterized nature.

Variational Inference

Reward Shaping via Meta-Learning

no code implementations27 Jan 2019 Haosheng Zou, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment in Reinforcement Learning (RL).

Meta-Learning Reinforcement Learning (RL)

Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information

no code implementations10 Oct 2018 Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update.

counterfactual

Learning to Write Stylized Chinese Characters by Reading a Handful of Examples

no code implementations6 Dec 2017 Danyang Sun, Tongzheng Ren, Chongxun Li, Hang Su, Jun Zhu

Automatically writing stylized Chinese characters is an attractive yet challenging task due to its wide applicabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.