RoomEnv-v0 (The Room environment - v0)

Introduced by Kim et al. in A Machine With Human-Like Memory Systems

The Room environment - v0

There is a newer version, v1

We have released a challenging OpenAI Gym compatible environment. The best strategy for this environment is to have both episodic and semantic memory systems. See the paper for more information.

Prerequisites

  1. A unix or unix-like x86 machine
  2. python 3.8 or higher.
  3. Running in a virtual environment (e.g., conda, virtualenv, etc.) is highly recommended so that you don't mess up with the system python.
  4. This env is added to the PyPI server. Just run: pip install room-env

Data collection

Data is collected from querying ConceptNet APIs. For simplicity, we only collect triples whose format is (head, AtLocation, tail). Here head is one of the 80 MS COCO dataset categories. This was kept in mind so that later on we can use images as well.

If you want to collect the data manually, then run below:

python collect_data.py

How does this environment work?

The OpenAI-Gym-compatible Room environment is one big room with N<sub>people</sub> number of people who can freely move around. Each of them selects one object, among N<sub>objects</sub>, and places it in one of the N<sub>locations</sub> locations. N<sub>agents</sub> number of agent(s) are also in this room. They can only observe one human placing an object, one at a time; x<sup>(t)</sup>. At the same time, they are given one question about the location of an object; q<sup>(t)</sup>. x<sup>(t)</sup> is given as a quadruple, (h<sup>(t)</sup>,r<sup>(t)</sup>,t<sup>(t)</sup>,t), For example, <James’s laptop, AtLocation, James’s desk, 42> accounts for an observation where an agent sees James placing his laptop on his desk at t = 42. q<sup>(t)</sup> is given as a double, (h,r). For example, <Karen’s cat, AtLocation> is asking where Karen’s cat is located. If the agent answers the question correctly, it gets a reward of  + 1, and if not, it gets 0.

The reason why the observations and questions are given as RDF-triple-like format is two folds. One is that this structured format is easily readable / writable by both humans and machines. Second is that we can use existing knowledge graphs, such as ConceptNet .

To simplify the environment, the agents themselves are not actually moving, but the room is continuously changing. There are several random factors in this environment to be considered:

  1. With the chance of p<sub>commonsense</sub>, a human places an object in a commonsense location (e.g., a laptop on a desk). The commonsense knowledge we use is from ConceptNet. With the chance of 1 − p<sub>commonsense</sub>, an object is placed at a non-commonsense random location (e.g., a laptop on the tree).

  2. With the chance of p<sub>new_location</sub>, a human changes object location.

  3. With the chance of p<sub>new_object</sub>, a human changes his/her object to another one.

  4. With the chance of p<sub>switch_person</sub>, two people switch their locations. This is done to mimic an agent moving around the room.

All of the four probabilities account for the Bernoulli distributions.

Consider there is only one agent. Then this is a POMDP, where S<sub>t</sub> = (x<sup>(t)</sup>, q<sup>(t)</sup>), A<sub>t</sub> = (do something with x<sup>(t)</sup>, answer q<sup>(t)</sup>), and R<sub>t</sub> ∈ {0, 1}.

Currently there is no RL trained for this. We only have some heuristics. Take a look at the paper for more details.

RoomEnv-v0

import gym
import room_env

env = gym.make("RoomEnv-v0")
(observation, question), info = env.reset()
rewards = 0

while True:
    (observation, question), reward, done, truncated, info = env.step("This is my answer!")
    rewards += reward
    if done:
        break

print(rewards)

Every time when an agent takes an action, the environment will give you an observation and a question to answer. You can try directly answering the question, such as env.step("This is my answer!"), but a better strategy is to keep the observations in memory systems and take advantage of the current observation and the history of them in the memory systems.

Take a look at this repo for an actual interaction with this environment to learn a policy.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Run make test && make style && make quality in the root repo directory, to ensure code quality.
  4. Commit your Changes (git commit -m 'Add some AmazingFeature')
  5. Push to the Branch (git push origin feature/AmazingFeature)
  6. Open a Pull Request

Cite our paper

@misc{https://doi.org/10.48550/arxiv.2204.01611,
  doi = {10.48550/ARXIV.2204.01611},
  url = {https://arxiv.org/abs/2204.01611},
  author = {Kim, Taewoon and Cochez, Michael and Francois-Lavet, Vincent and Neerincx,
  Mark and Vossen, Piek},
  keywords = {Artificial Intelligence (cs.AI), FOS: Computer and information sciences,
  FOS: Computer and information sciences},
  title = {A Machine With Human-Like Memory Systems},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Cite our code

DOI

Authors

License

MIT

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • MIT

Modalities


Languages