Search Results for author: Xiaoteng Ma

Found 20 papers, 12 papers with code

SEABO: A Simple Search-Based Method for Offline Imitation Learning

1 code implementation • 6 Feb 2024 • Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu

Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment.

D4RL Imitation Learning +2

Paper
Code

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

1 code implementation • 30 May 2023 • Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang

In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important.

Imitation Learning Offline RL

Paper
Code

Learning Diverse Risk Preferences in Population-based Self-play

1 code implementation • 19 May 2023 • Yuhua Jiang, Qihan Liu, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao

In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

1 code implementation • 10 Apr 2023 • Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li

To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO.

D4RL Data Augmentation +3

Paper
Code

Single-Trajectory Distributionally Robust Reinforcement Learning

no code implementations • 27 Jan 2023 • Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou

As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI).

Decision Making Q-Learning +2

Paper
Add Code

Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping

1 code implementation • 15 Sep 2022 • Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou

We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.

Continuous Control Offline RL

Paper
Code

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

no code implementations • 14 Sep 2022 • Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).

Offline RL reinforcement-learning +1

Paper
Add Code

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

no code implementations • 15 Jun 2022 • Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao

Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc.

Autonomous Driving Continuous Control +3

Paper
Add Code

Mildly Conservative Q-Learning for Offline Reinforcement Learning

3 code implementations • 9 Jun 2022 • Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu

The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated.

D4RL Q-Learning +2

229

Paper
Code

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

1 code implementation • 6 Jun 2022 • Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks.

Decision Making Offline RL +2

Paper
Code

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

no code implementations • 15 Jan 2022 • Shuai Ma, Xiaoteng Ma, Li Xia

To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula.

Bilevel Optimization Management

Paper
Add Code

Offline Reinforcement Learning with Value-based Episodic Memory

1 code implementation • ICLR 2022 • Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.

D4RL Offline RL +2

Paper
Code

MPSN: Motion-aware Pseudo Siamese Network for Indoor Video Head Detection in Buildings

1 code implementation • 7 Oct 2021 • Kailai Sun, Xiaoteng Ma, Peng Liu, Qianchuan Zhao

Head detection in the indoor video is an essential component of building occupancy detection.

Head Detection Model Selection +1

Paper
Code

Average-Reward Reinforcement Learning with Trust Region Methods

no code implementations • 7 Jun 2021 • Xiaoteng Ma, Xiaohang Tang, Li Xia, Jun Yang, Qianchuan Zhao

Our work provides a unified framework of the trust region approach including both the discounted and average criteria, which may complement the framework of reinforcement learning beyond the discounted objectives.

Continuous Control reinforcement-learning +1

Paper
Add Code

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

1 code implementation • NeurIPS 2021 • Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint.

Multi-agent Reinforcement Learning Offline RL +5

Paper
Code

Efficient Continuous Control with Double Actors and Regularized Critics

1 code implementation • 6 Jun 2021 • Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li

First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.

Continuous Control Reinforcement Learning (RL)

Paper
Code

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

no code implementations • 10 Feb 2021 • Xiaoteng Ma, Yiqin Yang, Chenghao Li, Yiwen Lu, Qianchuan Zhao, Yang Jun

Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks.

Continuous Control Multi-agent Reinforcement Learning +2

Paper
Add Code

SOAC: The Soft Option Actor-Critic Architecture

no code implementations • 25 Jun 2020 • Chenghao Li, Xiaoteng Ma, Chongjie Zhang, Jun Yang, Li Xia, Qianchuan Zhao

In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.

Transfer Learning

Paper
Add Code

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

1 code implementation • 5 Jun 2020 • Ming Zhang, Yawei Wang, Xiaoteng Ma, Li Xia, Jun Yang, Zhiheng Li, Xiu Li

The generative adversarial imitation learning (GAIL) has provided an adversarial learning framework for imitating expert policy from demonstrations in high-dimensional continuous tasks.

Continuous Control Imitation Learning

Paper
Code

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

no code implementations • 30 Apr 2020 • Xiaoteng Ma, Li Xia, Zhengyuan Zhou, Jun Yang, Qianchuan Zhao

In this paper, we present a new reinforcement learning (RL) algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance.

Continuous Control reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.