class slf: pass
test_fail(ebase.TransitionTensor, args=slf)Environment Base
ebase
ebase ()
Base environment. All environments should inherit from this one.
The ebase class __init__ mostly contains consistency checks.
Core methods
These need to be implemented by a concrete environment.
The transitions tensor Tsjas' gives the probability of the environment to transition to state s', given that it was in state s and the agent chose the joint action ja.
ebase.TransitionTensor
ebase.TransitionTensor ()
raises NotImplementedError.
The reward tensor Risjas' gives the reward agent i receives when the environment is in state s, all agents choose the join action ja, and the environment transitions to state s'.
ebase.RewardTensor
ebase.RewardTensor ()
class slf: pass
test_fail(ebase.RewardTensor, args=slf)raises NotImplementedError.
The following two “core” methods are optional. If the concrete environment class does not implement them, they default to the following:
The observation tensor Oiso gives the probability that agent i observes observation o when the environment is in state s. The default observation tensor assumes perfect observation and sets the number of observations Q to the number of states Z.
ebase.ObservationTensor
ebase.ObservationTensor ()
Default observation tensor: perfect observation
class slf: Z = 2; N = 3 # dummy self for demonstration only
ebase.ObservationTensor(slf)array([[[1., 0.],
[0., 1.]],
[[1., 0.],
[0., 1.]],
[[1., 0.],
[0., 1.]]])
Final states Fs indicate which states of the environment cause the end of an episode. Their meaning and use within CRLD are not fully resolved yet. If an environment does not implement FinalStates they default to no final states.
ebase.FinalStates
ebase.FinalStates ()
Default final states: no final states
class slf: Z = 7 # dummy self for demonstration only
ebase.FinalStates(slf)array([0, 0, 0, 0, 0, 0, 0])
Default string representations
String representations of actions, states and observations help with interpreting the results of simulation runs. Ideally, an environment class will implement these methods with descriptive values.
To show these methods here we create a dummy “self” of 2 environmental states, containing 3 agents with 4 actions and 5 observations of the environmental states.
# dummy self of 2 environmental 2 agents with 3 actions in an environment
class slf: Z = 2; N = 3; M=4; Q=5ebase.actions
ebase.actions ()
Default action set representations act_im.
ebase.actions(slf)[['0', '1', '2', '3'], ['0', '1', '2', '3'], ['0', '1', '2', '3']]
ebase.states
ebase.states ()
Default state set representation state_s.
ebase.states(slf)['0', '1']
ebase.observations
ebase.observations ()
Default observation set representations obs_io.
ebase.observations(slf)[['0', '1', '2', '3', '4'],
['0', '1', '2', '3', '4'],
['0', '1', '2', '3', '4']]
ebase.__repr__
ebase.__repr__ ()
Return repr(self).
ebase.__str__
ebase.__str__ ()
Return str(self).
ebase.id
ebase.id ()
Returns id string of environment
Interactive use
Environments can also be used interactivly, e.g., with iterative learning algorithms. For this purpose we provide the OpenAI Gym [step](https://wbarfuss.github.io/pyCRLD/Agents/avaluebase.html#step) Interface.
ebase.step
ebase.step (jA:Iterable)
Iterate the environment one step forward.
| Type | Details | |
|---|---|---|
| jA | Iterable | joint actions |
| Returns | tuple | (observations_i, rewards_i, done, info) |
ebase.observation
ebase.observation ()
Possibly random observation for each agent from the current state.