Environment Base

Base class for CRLD environments

source

ebase

 ebase ()

Base environment. All environments should inherit from this one.

The ebase class __init__ mostly contains consistency checks.

Core methods

These need to be implemented by a concrete environment.

The transitions tensor Tsjas' gives the probability of the environment to transition to state s', given that it was in state s and the agent chose the joint action ja.


source

ebase.TransitionTensor

 ebase.TransitionTensor ()
class slf: pass
test_fail(ebase.TransitionTensor, args=slf)

raises NotImplementedError.

The reward tensor Risjas' gives the reward agent i receives when the environment is in state s, all agents choose the join action ja, and the environment transitions to state s'.


source

ebase.RewardTensor

 ebase.RewardTensor ()
class slf: pass
test_fail(ebase.RewardTensor, args=slf)

raises NotImplementedError.

The following two “core” methods are optional. If the concrete environment class does not implement them, they default to the following:

The observation tensor Oiso gives the probability that agent i observes observation o when the environment is in state s. The default observation tensor assumes perfect observation and sets the number of observations Q to the number of states Z.


source

ebase.ObservationTensor

 ebase.ObservationTensor ()

Default observation tensor: perfect observation

class slf: Z = 2; N = 3  # dummy self for demonstration only
ebase.ObservationTensor(slf)
array([[[1., 0.],
        [0., 1.]],

       [[1., 0.],
        [0., 1.]],

       [[1., 0.],
        [0., 1.]]])

Final states Fs indicate which states of the environment cause the end of an episode. Their meaning and use within CRLD are not fully resolved yet. If an environment does not implement FinalStates they default to no final states.


source

ebase.FinalStates

 ebase.FinalStates ()

Default final states: no final states

class slf: Z = 7 # dummy self for demonstration only
ebase.FinalStates(slf)
array([0, 0, 0, 0, 0, 0, 0])

Default string representations

String representations of actions, states and observations help with interpreting the results of simulation runs. Ideally, an environment class will implement these methods with descriptive values.

To show these methods here we create a dummy “self” of 2 environmental states, containing 3 agents with 4 actions and 5 observations of the environmental states.

# dummy self of 2 environmental 2 agents with 3 actions in an environment
class slf: Z = 2; N = 3; M=4; Q=5

source

ebase.actions

 ebase.actions ()

Default action set representations act_im.

ebase.actions(slf)
[['0', '1', '2', '3'], ['0', '1', '2', '3'], ['0', '1', '2', '3']]

source

ebase.states

 ebase.states ()

Default state set representation state_s.

ebase.states(slf)
['0', '1']

source

ebase.observations

 ebase.observations ()

Default observation set representations obs_io.

ebase.observations(slf)
[['0', '1', '2', '3', '4'],
 ['0', '1', '2', '3', '4'],
 ['0', '1', '2', '3', '4']]

source

ebase.__repr__

 ebase.__repr__ ()

Return repr(self).


source

ebase.__str__

 ebase.__str__ ()

Return str(self).


source

ebase.id

 ebase.id ()

Returns id string of environment

Interactive use

Environments can also be used interactivly, e.g., with iterative learning algorithms. For this purpose we provide the OpenAI Gym [step](https://wbarfuss.github.io/pyCRLD/Agents/avaluebase.html#step) Interface.


source

ebase.step

 ebase.step (jA:Iterable)

Iterate the environment one step forward.

Type Details
jA Iterable joint actions
Returns tuple (observations_i, rewards_i, done, info)

source

ebase.observation

 ebase.observation ()

Possibly random observation for each agent from the current state.