no code implementations • 5 Feb 2024 • Sugandha Sharma, Guy Davidson, Khimya Khetarpal, Anssi Kanervisto, Udit Arora, Katja Hofmann, Ida Momennejad
First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space.
1 code implementation • NeurIPS 2023 • Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Rohin Shah
Given the completion of two years of BASALT competitions, we offer to the community a formalized benchmark through the BASALT Evaluation and Demonstrations Dataset (BEDD), which serves as a resource for algorithm development and performance assessment.
no code implementations • 4 Dec 2023 • Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin
Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community.
no code implementations • 23 Mar 2023 • Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yujing Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik, Shu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Ellen Novoseller, Vinicius G. Goecks, Nicholas Waytowich, David Watkins, Josh Miller, Rohin Shah
To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022.
1 code implementation • 25 Jan 2023 • Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin
This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.
1 code implementation • 18 May 2022 • Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years.
1 code implementation • 14 May 2022 • Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating.
no code implementations • 14 Apr 2022 • Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Chan Jun Shern, Daniel del Castillo, Tom Lieberum
The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks.
1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.
no code implementations • 17 Feb 2022 • Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman
With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.
no code implementations • 24 Jan 2022 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security.
1 code implementation • 5 Jul 2021 • Shashank Hegde, Anssi Kanervisto, Aleksei Petrenko
We are currently in the process of merging the augmented simulator with the main ViZDoom code repository.
no code implementations • 5 Jul 2021 • Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
1 code implementation • 1 Jul 2021 • Anssi Kanervisto, Christian Scheller, Yanick Schraner, Ville Hautamäki
Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains.
no code implementations • 7 Jun 2021 • William Hebgen Guss, Stephanie Milani, Nicholay Topin, Brandon Houghton, Sharada Mohanty, Andrew Melnik, Augustin Harter, Benoit Buschmaas, Bjarne Jaster, Christoph Berganski, Dennis Heitkamp, Marko Henning, Helge Ritter, Chengjie WU, Xiaotian Hao, Yiming Lu, Hangyu Mao, Yihuan Mao, Chao Wang, Michal Opanowicz, Anssi Kanervisto, Yanick Schraner, Christian Scheller, Xiren Zhou, Lu Liu, Daichi Nishio, Toi Tsuneda, Karolis Ramanauskas, Gabija Juceviciute
Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field.
no code implementations • 21 Apr 2021 • Keishi Ishihara, Anssi Kanervisto, Jun Miura, Ville Hautamäki
This does not only improve the success rate of standard benchmarks, but also the ability to react to traffic lights, which we show with standard benchmarks.
1 code implementation • 1 Apr 2021 • Dylan Ashley, Anssi Kanervisto, Brendan Bennett
We present AlphaChute: a state-of-the-art algorithm that achieves superhuman performance in the ancient game of Chutes and Ladders.
1 code implementation • 2 Dec 2020 • Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update.
1 code implementation • 7 May 2020 • Anssi Kanervisto, Janne Karttunen, Ville Hautamäki
MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment.
1 code implementation • 2 Apr 2020 • Anssi Kanervisto, Joonas Pussinen, Ville Hautamäki
We take a step towards a general approach and study the general applicability of behavioural cloning on twelve video games, including six modern video games (published after 2010), by using human demonstrations as training data.
1 code implementation • 2 Apr 2020 • Anssi Kanervisto, Christian Scheller, Ville Hautamäki
In this work, we aim to gain insight on these action space modifications by conducting extensive experiments in video-game environments.
1 code implementation • 6 Feb 2020 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other.
1 code implementation • 6 Jul 2019 • Bilal Soomro, Anssi Kanervisto, Trung Ngo Trong, Ville Hautamäki
One such debugging method used with image classification DNNs is activation maximization, which generates example-images that are classified as one of the classes.
no code implementations • 10 May 2019 • Abraham Woubie, Anssi Kanervisto, Janne Karttunen, Ville Hautamaki
In this work, we propose the use of audio as complementary information to visual only in state representation.
1 code implementation • 2 May 2019 • Janne Karttunen, Anssi Kanervisto, Ville Kyrki, Ville Hautamäki
Deep reinforcement learning has proven to be successful for learning tasks in simulated environments, but applying same techniques for robots in real-world domain is more challenging, as they require hours of training.
1 code implementation • 8 Nov 2018 • Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen
The popularization of science can often be disregarded by scientists as it may be challenging to put highly sophisticated research into words that general public can understand.
Audio and Speech Processing Sound
1 code implementation • 26 Jul 2018 • Anssi Kanervisto, Ville Hautamäki
We present Toribash Learning Environment (ToriLLE), a learning environment for machine learning agents based on the video game Toribash.
14 code implementations • ICML 2017 • Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.