1 code implementation • 20 Jan 2024 • Haonan Yu, Wei Xu
Unsupervised video object learning seeks to decompose video scenes into structural object representations without any supervision from depth, optical flow, or segmentation.
1 code implementation • 2 Feb 2023 • Haichao Zhang, We Xu, Haonan Yu
With this approach, the policy previously learned offline is fully retained during online learning, thus mitigating the potential issues such as destroying the useful behaviors of the offline policy in the initial stage of online learning while allowing the offline policy participate in the exploration naturally in an adaptive manner.
1 code implementation • 28 Jan 2022 • Haonan Yu, Wei Xu, Haichao Zhang
On 12 Safety Gym tasks and 2 safe racing tasks, SEditor obtains much a higher overall safety-weighted-utility (SWU) score than the baselines, and demonstrates outstanding utility performance with constraint violation rates as low as once per 2k time steps, even in obstacle-dense environments.
2 code implementations • 28 Jan 2022 • Haonan Yu, Haichao Zhang, Wei Xu
On the other hand, our large-scale empirical study shows that using entropy regularization alone in policy improvement, leads to comparable or even better performance and robustness than using it in both policy improvement and policy evaluation.
1 code implementation • ICLR 2022 • Haichao Zhang, Wei Xu, Haonan Yu
GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each action at single step level, whose consistent movement decays exponentially with the number of exploration steps.
2 code implementations • NeurIPS 2021 • Haonan Yu, Wei Xu, Haichao Zhang
TAAC has two important features: a) persistent exploration, and b) a new compare-through Q operator for multi-step TD backup, specially tailored to the action repetition scenario.
1 code implementation • ICLR 2021 • Jesse Zhang, Haonan Yu, Wei Xu
We propose a hierarchical reinforcement learning method, HIDIO, that can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
Hierarchical Reinforcement Learning reinforcement-learning +1
1 code implementation • 22 Jul 2019 • Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston
In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.
3 code implementations • 19 Jul 2019 • Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam
This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions.
no code implementations • 7 Jul 2019 • Yanqi Zhou, Peng Wang, Sercan Arik, Haonan Yu, Syed Zawad, Feng Yan, Greg Diamos
In this paper, we propose Efficient Progressive Neural Architecture Search (EPNAS), a neural architecture search (NAS) that efficiently handles large search space through a novel progressive search policy with performance prediction based on REINFORCE~\cite{Williams. 1992. PG}.
2 code implementations • NeurIPS 2019 • Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian
Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset.
no code implementations • ICLR 2020 • Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos
The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019).
1 code implementation • 22 May 2018 • Haonan Yu, Xiaochen Lian, Haichao Zhang, Wei Xu
Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning.
1 code implementation • ACL 2018 • Haichao Zhang, Haonan Yu, Wei Xu
Building intelligent agents that can communicate with and learn from humans in natural language is of great value.
2 code implementations • ICLR 2018 • Haonan Yu, Haichao Zhang, Wei Xu
We build a virtual agent for learning language in a 2D maze-like world.
1 code implementation • 28 May 2017 • Haichao Zhang, Haonan Yu, Wei Xu
One of the long-term goals of artificial intelligence is to build an agent that can communicate intelligently with human in natural language.
no code implementations • 28 Mar 2017 • Haonan Yu, Haichao Zhang, Wei Xu
We believe that our results provide some preliminary insights on how to train an agent with similar abilities in a 3D environment.
no code implementations • 18 Nov 2015 • Daniel Paul Barrett, ran Xu, Haonan Yu, Jeffrey Mark Siskind
We make available to the community a new dataset to support action-recognition research.
no code implementations • CVPR 2016 • Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, Wei Xu
The sentence generator produces one simple short sentence that describes a specific short video interval.
no code implementations • 25 Aug 2015 • Daniel Paul Barrett, Scott Alan Bronikowski, Haonan Yu, Jeffrey Mark Siskind
We present a unified framework which supports grounding natural-language semantics in robotic driving.
no code implementations • 5 Jun 2015 • Haonan Yu, Jeffrey Mark Siskind
We tackle the problem of video object codetection by leveraging the weak semantic constraint implied by sentences that describe the video content.
no code implementations • 14 Nov 2014 • Haonan Yu, Daniel P. Barrett, Jeffrey Mark Siskind
Prior work presented the sentence tracker, a method for scoring how well a sentence describes a video clip or alternatively how well a video clip depicts a sentence.
no code implementations • 21 Jun 2013 • Haonan Yu, Jeffrey Mark Siskind
We present a method for learning word meanings from complex and realistic video clips by discriminatively training (DT) positive sentential labels against negative ones, and then use the trained word models to generate sentential descriptions for new video.
no code implementations • CVPR 2013 • Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, Song Wang
In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case.