Base

Base class containing the core methods of CRLD agents

The agent base class

contains core methods to compute the strategy-average reward-prediction error.


source

abase

 abase (TransitionTensor:numpy.ndarray, RewardTensor:numpy.ndarray,
        DiscountFactors:Iterable[float], use_prefactor=False,
        opteinsum=True)

Base class for deterministic strategy-average independent (multi-agent) temporal-difference reinforcement learning.

Type Default Details
TransitionTensor ndarray transition model of the environment
RewardTensor ndarray reward model of the environment
DiscountFactors Iterable the agents’ discount factors
use_prefactor bool False use the 1-DiscountFactor prefactor
opteinsum bool True optimize einsum functions

Strategy averaging

Core methods to compute the strategy-average reward-prediction error


source

abase.Tss

 abase.Tss (Xisa:jax.Array)

Compute average transition model Tss, given joint strategy Xisa

Type Details
Xisa Array Joint strategy
Returns Array Average transition matrix

source

abase.Tisas

 abase.Tisas (Xisa:jax.Array)

Compute average transition model Tisas, given joint strategy Xisa

Type Details
Xisa Array Joint strategy
Returns Array Average transition Tisas

source

abase.Ris

 abase.Ris (Xisa:jax.Array, Risa:jax.Array=None)

Compute average reward Ris, given joint strategy Xisa

Type Default Details
Xisa Array Joint strategy
Risa Array None Optional reward for speed-up
Returns Array Average reward

source

abase.Risa

 abase.Risa (Xisa:jax.Array)

Compute average reward Risa, given joint strategy Xisa

Type Details
Xisa Array Joint strategy
Returns Array Average reward

source

abase.Vis

 abase.Vis (Xisa:jax.Array, Ris:jax.Array=None, Tss:jax.Array=None,
            Risa:jax.Array=None)

Compute average state values Vis, given joint strategy Xisa

Type Default Details
Xisa Array Joint strategy
Ris Array None Optional reward for speed-up
Tss Array None Optional transition for speed-up
Risa Array None Optional reward for speed-up
Returns Array Average state values

source

abase.Qisa

 abase.Qisa (Xisa:jax.Array, Risa:jax.Array=None, Vis:jax.Array=None,
             Tisas:jax.Array=None)

Compute average state-action values Qisa, given joint strategy Xisa

Type Default Details
Xisa Array Joint strategy
Risa Array None Optional reward for speed-up
Vis Array None Optional values for speed-up
Tisas Array None Optional transition for speed-up
Returns Array Average state-action values

Helpers


source

abase.Ps

 abase.Ps (Xisa:jax.Array)

Compute stationary state distribution Ps, given joint strategy Xisa.

Type Details
Xisa Array Joint strategy
Returns Array Stationary state distribution

Ps uses the compute_stationarydistribution function.

from pyCRLD.Environments.EcologicalPublicGood import EcologicalPublicGood as EPG
from pyCRLD.Agents.StrategyActorCritic import stratAC
env = EPG(N=2, f=1.2, c=5, m=-5, qc=0.2, qr=0.01, degraded_choice=False)
MAEi = stratAC(env=env, learning_rates=0.1, discount_factors=0.99, use_prefactor=True)

x = MAEi.random_softmax_strategy()
MAEi._numpyPs(x)
array([0.91309416, 0.08690587], dtype=float32)
MAEi.Ps(x)
Array([0.91309416, 0.08690587], dtype=float32)

source

abase.Ri

 abase.Ri (Xisa:jax.Array)

Compute average reward Ri, given joint strategy Xisa.

Type Details
Xisa Array Joint strategy Xisa
Returns Array Average reward Ri
MAEi.Ri(x)
Array([-4.6322937, -4.5121984], dtype=float32)

source

abase.trajectory

 abase.trajectory (Xinit:jax.Array, Tmax:int=100, tolerance:float=None,
                   verbose=False, **kwargs)

Compute a joint learning trajectory.

Type Default Details
Xinit Array Initial condition
Tmax int 100 the maximum number of iteration steps
tolerance float None to determine if a fix point is reached
verbose bool False Say something during computation?
kwargs
Returns tuple (trajectory, fixpointreached)

trajectory is an Array containing the time-evolution of the dynamic variable. fixpointreached is a bool saying whether or not a fixed point has been reached.


source

abase._OtherAgentsActionsSummationTensor

 abase._OtherAgentsActionsSummationTensor ()

To sum over the other agents and their respective actions using einsum.

To obtain the strategy-average reward-prediction error for agent \(i\), we need to average out the probabilities contained in the strategies of all other agents \(j \neq i\) and the transition function \(T\),

\[ \sum_{a^j} \sum_{s'} \prod_{i\neq j} X^j(s, a^j) T(s, \mathbf a, s'). \]

The _OtherAgentsActionsSummationTensor enables this summation to be exectued in the efficient einsum function. It contains only \(0\)s and \(1\)s and is of dimension

\[ N \times \underbrace{N \times ... \times N}_{(N-1) \text{ times}} \times M \times \underbrace{M \times ... \times M}_{N \text{ times}} \times \underbrace{M \times ... \times M}_{(N-1) \text{ times}} \]

which represent

\[ \overbrace{N}^{\text{the focal agent}} \times \overbrace{\underbrace{N \times ... \times N}_{(N-1) \text{ times}}}^\text{all other agents} \times \overbrace{M}^\text{focal agent's action} \times \overbrace{\underbrace{M \times ... \times M}_{N \text{ times}}}^\text{all actions} \times \overbrace{\underbrace{M \times ... \times M}_{(N-1) \text{ times}}}^\text{all other agents' actions} \]

It contains a \(1\) only if

  • all agent indices (comprised of the focal agent index and all other agents indices) are different from each other
  • and the focal agent’s action index matches the focal agents’ action index in all actions
  • and if all other agents’ action indices match their corresponding action indices in all actions.

Otherwise it contains a \(0\).