Search Results for author: Chuheng Zhang

Found 19 papers, 5 papers with code

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

no code implementations • 17 Apr 2024 • Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly.

Paper
Add Code

ARO: Large Language Model Supervised Robotics Text2Skill Autonomous Learning

no code implementations • 23 Mar 2024 • YiWen Chen, Yuyao Ye, Ziyi Chen, Chuheng Zhang, Marcelo H. Ang

Robotics learning highly relies on human expertise and efforts, such as demonstrations, design of reward functions in reinforcement learning, performance evaluation using human feedback, etc.

Language Modelling Large Language Model

Paper
Add Code

Pre-Trained Large Language Models for Industrial Control

no code implementations • 6 Aug 2023 • Lei Song, Chuheng Zhang, Li Zhao, Jiang Bian

2)~How well can GPT-4 generalize to different scenarios for HVAC control?

Paper
Add Code

A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management

1 code implementation • 13 Jun 2023 • Xianliang Yang, Zhihao Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Jiang Bian

Multi-agent reinforcement learning (MARL) models multiple agents that interact and learn within a shared environment.

Autonomous Driving Management +2

Paper
Code

Towards Generalizable Reinforcement Learning for Trade Execution

no code implementations • 12 May 2023 • Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao

To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms.

Offline RL reinforcement-learning +1

Paper
Add Code

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

no code implementations • 3 Mar 2023 • Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory.

Data Augmentation Language Modelling +3

Paper
Add Code

Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management

no code implementations • 15 Dec 2022 • Yuandong Ding, Mingxiao Feng, Guozi Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Houqiang Li, Yan Jin, Jiang Bian

In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand.

Management Multi-agent Reinforcement Learning +2

Paper
Add Code

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

no code implementations • 5 Dec 2022 • Wei Shen, Xiaonan He, Chuheng Zhang, Xuyun Zhang, Jian Xie

Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system.

Spoken Dialogue Systems

Paper
Add Code

TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

1 code implementation • 5 Dec 2022 • Yuanying Cai, Chuheng Zhang, Li Zhao, Wei Shen, Xuyun Zhang, Lei Song, Jiang Bian, Tao Qin, TieYan Liu

There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies.

D4RL Offline RL +2

Paper
Code

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks

no code implementations • 2 Apr 2022 • Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e. g., e-commerce and news feed sites).

Contrastive Learning Reinforcement Learning (RL)

Paper
Add Code

Hybrid Transfer in Deep Reinforcement Learning for Ads Allocation

no code implementations • 2 Apr 2022 • Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Bingqi Zhu, Yongkang Wang, Xingxing Wang, Dong Wang

Ads allocation, which involves allocating ads and organic items to limited slots in feed with the purpose of maximizing platform revenue, has become a research hotspot.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation

no code implementations • 1 Apr 2022 • Guogang Liao, Xiaowen Shi, Ze Wang, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

A mixed list of ads and organic items is usually displayed in feed and how to allocate the limited slots to maximize the overall revenue is a key problem.

Click-Through Rate Prediction reinforcement-learning +1

Paper
Add Code

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

1 code implementation • 9 Sep 2021 • Guogang Liao, Ze Wang, Xiaoxu Wu, Xiaowen Shi, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments.

Paper
Code

Inductive Matrix Completion Using Graph Autoencoder

2 code implementations • 25 Aug 2021 • Wei Shen, Chuheng Zhang, Yun Tian, Liang Zeng, Xiaonan He, Wanchun Dou, Xiaolong Xu

However, without node content (i. e., side information) for training, the user (or item) specific representation can not be learned in the inductive setting, that is, a model trained on one group of users (or items) cannot adapt to new users (or items).

Ranked #3 on Recommendation Systems on MovieLens 1M

Matrix Completion Recommendation Systems

Paper
Code

Return-Based Contrastive Representation Learning for Reinforcement Learning

no code implementations • ICLR 2021 • Guoqing Liu, Chuheng Zhang, Li Zhao, Tao Qin, Jinhua Zhu, Jian Li, Nenghai Yu, Tie-Yan Liu

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL).

Atari Games reinforcement-learning +2

Paper
Add Code

DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

1 code implementation • 3 Oct 2020 • Chuheng Zhang, Yuanqi Li, Xi Chen, Yifei Jin, Pingzhong Tang, Jian Li

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns.

BIG-bench Machine Learning feature selection

14,154

Paper
Code

Auxiliary-task Based Deep Reinforcement Learning for Participant Selection Problem in Mobile Crowdsourcing

no code implementations • 25 Aug 2020 • Wei Shen, Xiaonan He, Chuheng Zhang, Qiang Ni, Wanchun Dou, Yan Wang

Therefore, it is crucial to design a participant selection algorithm that applies to different MCS systems to achieve multiple goals.

Combinatorial Optimization Fairness +2

Paper
Add Code

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

no code implementations • 11 Jun 2020 • Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Policy Search by Target Distribution Learning for Continuous Control

no code implementations • 27 May 2019 • Chuheng Zhang, Yuanqi Li, Jian Li

We observe that several existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic (even in some very simple environments), leading to an unstable training process.

Continuous Control Policy Gradient Methods +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.