Search Results for author: Yihao Feng

Found 28 papers, 15 papers with code

Accountable Off-Policy Evaluation via a Kernelized Bellman Statistics

no code implementations • ICML 2020 • Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

In this work, we investigate the statistical properties of the kernel loss, which allows us to find a feasible set that contains the true value function with high probability.

Off-policy evaluation

Paper
Add Code

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation • 1 Apr 2024 • Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

Paper
Code

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

1 code implementation • 28 Feb 2024 • Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, ran Xu, Wenpeng Yin, Caiming Xiong

This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents.

Paper
Code

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

2 code implementations • 23 Feb 2024 • JianGuo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training.

Paper
Code

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

2 code implementations • 11 Aug 2023 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs).

Benchmarking Decision Making

276

Paper
Code

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

no code implementations • 4 Aug 2023 • Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Paper
Add Code

REX: Rapid Exploration and eXploitation for AI Agents

no code implementations • 18 Jul 2023 • Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

FAMO: Fast Adaptive Multitask Optimization

1 code implementation • NeurIPS 2023 • Bo Liu, Yihao Feng, Peter Stone, Qiang Liu

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL).

Computational Efficiency

Paper
Code

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

1 code implementation • NeurIPS 2023 • Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, ran Xu

Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages.

Image Generation

578

Paper
Code

HIVE: Harnessing Human Feedback for Instructional Visual Editing

1 code implementation • 16 Mar 2023 • Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences.

Text-based Image Editing

Paper
Code

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

2 code implementations • 20 Feb 2023 • Yihao Feng, Shentao Yang, Shujian Zhang, JianGuo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang

Prior works mainly focus on adopting advanced RL techniques to train the ToD agents, while the design of the reward function is not well studied.

Learning-To-Rank Reinforcement Learning (RL) +2

819

Paper
Code

A Unified Framework for Alternating Offline Model Training and Policy Learning

1 code implementation • 12 Oct 2022 • Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment.

Continuous Control Model-based Reinforcement Learning +2

Paper
Code

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

2 code implementations • 17 Aug 2022 • Bo Liu, Yihao Feng, Qiang Liu, Peter Stone

Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s, a, g) into the negated summation of a metric plus a residual asymmetric component.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

1 code implementation • 14 Jun 2022 • Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou

Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process.

Continuous Control Offline RL +2

Paper
Code

A Behavior Regularized Implicit Policy for Offline Reinforcement Learning

no code implementations • 19 Feb 2022 • Shentao Yang, Zhendong Wang, Huangjie Zheng, Yihao Feng, Mingyuan Zhou

For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.

D4RL reinforcement-learning +1

Paper
Add Code

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

no code implementations • 1 Jan 2022 • Ziyang Tang, Yihao Feng, Qiang Liu

The benefit of learning the operator is that we can incorporate any new reward function as input and attain its corresponding value function in a zero-shot manner.

Q-Learning reinforcement-learning +1

Paper
Add Code

Unsupervised Out-of-Domain Detection via Pre-trained Transformers

1 code implementation • ACL 2021 • Keyang Xu, Tongzheng Ren, Shikun Zhang, Yihao Feng, Caiming Xiong

Deployed real-world machine learning applications are often subject to uncontrolled and even potentially malicious inputs.

Paper
Code

Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System

1 code implementation • NAACL 2021 • Congying Xia, Wenpeng Yin, Yihao Feng, Philip Yu

Two major challenges exist in this new task: (i) For the learning process, the system should incrementally learn new classes round by round without re-training on the examples of preceding classes; (ii) For the performance, the system should perform well on new classes without much loss on preceding classes.

Few-Shot Text Classification General Classification +4

Paper
Code

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

no code implementations • ICLR 2021 • Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies.

Off-policy evaluation Open-Ended Question Answering +1

Paper
Add Code

Off-Policy Interval Estimation with Lipschitz Value Iteration

no code implementations • NeurIPS 2020 • Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, Qiang Liu

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data.

Decision Making Medical Diagnosis +1

Paper
Add Code

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

no code implementations • 15 Aug 2020 • Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu

We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy.

Medical Diagnosis Off-policy evaluation +1

Paper
Add Code

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

no code implementations • ICLR 2020 • Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Our method is doubly robust in that the bias vanishes when either the density ratio or the value function estimation is perfect.

Density Ratio Estimation Off-policy evaluation

Paper
Add Code

A Kernel Loss for Solving the Bellman Equation

1 code implementation • NeurIPS 2019 • Yihao Feng, Lihong Li, Qiang Liu

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms.

Q-Learning Reinforcement Learning (RL)

Paper
Code

Knowledge-guided Semantic Computing Network

no code implementations • 29 Sep 2018 • Guangming Shi, Zhongqiang Zhang, Dahua Gao, Xuemei Xie, Yihao Feng, Xinrui Ma, Danhua Liu

Besides, to enhance the recognition ability of the semantic tree in aspects of the diversity, randomicity and variability, we use the traditional neural network to aid the semantic tree to learn some indescribable features.

Adversarial Robustness Object Recognition

Paper
Add Code

Shrinkage-based Bias-Variance Trade-off for Deep Reinforcement Learning

no code implementations • 27 Sep 2018 • Yihao Feng, Hao liu, Jian Peng, Qiang Liu

Deep reinforcement learning has achieved remarkable successes in solving various challenging artificial intelligence tasks.

Continuous Control reinforcement-learning +1

Paper
Add Code

Action-depedent Control Variates for Policy Optimization via Stein's Identity

2 code implementations • 30 Oct 2017 • Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu

Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems.

Policy Gradient Methods reinforcement-learning +1

Paper
Code

Learning to Draw Samples with Amortized Stein Variational Gradient Descent

no code implementations • 20 Jul 2017 • Yihao Feng, Dilin Wang, Qiang Liu

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference.

Bayesian Inference

Paper
Add Code

Two Methods For Wild Variational Inference

no code implementations • 30 Nov 2016 • Qiang Liu, Yihao Feng

Variational inference provides a powerful tool for approximate probabilistic in- ference on complex, structured models.

Variational Inference Vocal Bursts Valence Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.