Base

Base class containing the core methods of CRLD agents

The agent base class

contains core methods to compute the strategy-average reward-prediction error.

abase

 abase (TransitionTensor:numpy.ndarray, RewardTensor:numpy.ndarray,
        DiscountFactors:Iterable[float], use_prefactor=False,
        opteinsum=True)

Base class for deterministic strategy-average independent (multi-agent) temporal-difference reinforcement learning.

	Type	Default	Details
TransitionTensor	ndarray		transition model of the environment
RewardTensor	ndarray		reward model of the environment
DiscountFactors	Iterable		the agents’ discount factors
use_prefactor	bool	False	use the 1-DiscountFactor prefactor
opteinsum	bool	True	optimize einsum functions

Strategy averaging

Core methods to compute the strategy-average reward-prediction error

source

abase.Tss

 abase.Tss (Xisa:jax.Array)

Compute average transition model Tss, given joint strategy Xisa

	Type	Details
Xisa	Array	Joint strategy
Returns	Array	Average transition matrix

source

abase.Tisas

 abase.Tisas (Xisa:jax.Array)

Compute average transition model Tisas, given joint strategy Xisa

	Type	Details
Xisa	Array	Joint strategy
Returns	Array	Average transition Tisas

source

abase.Ris

 abase.Ris (Xisa:jax.Array, Risa:jax.Array=None)

Compute average reward Ris, given joint strategy Xisa

	Type	Default	Details
Xisa	Array		Joint strategy
Risa	Array	None	Optional reward for speed-up
Returns	Array		Average reward

source

abase.Risa

 abase.Risa (Xisa:jax.Array)

Compute average reward Risa, given joint strategy Xisa

	Type	Details
Xisa	Array	Joint strategy
Returns	Array	Average reward

source

abase.Vis

 abase.Vis (Xisa:jax.Array, Ris:jax.Array=None, Tss:jax.Array=None,
            Risa:jax.Array=None)

Compute average state values Vis, given joint strategy Xisa

	Type	Default	Details
Xisa	Array		Joint strategy
Ris	Array	None	Optional reward for speed-up
Tss	Array	None	Optional transition for speed-up
Risa	Array	None	Optional reward for speed-up
Returns	Array		Average state values

source

abase.Qisa

 abase.Qisa (Xisa:jax.Array, Risa:jax.Array=None, Vis:jax.Array=None,
             Tisas:jax.Array=None)

Compute average state-action values Qisa, given joint strategy Xisa

	Type	Default	Details
Xisa	Array		Joint strategy
Risa	Array	None	Optional reward for speed-up
Vis	Array	None	Optional values for speed-up
Tisas	Array	None	Optional transition for speed-up
Returns	Array		Average state-action values

Helpers

source

abase.Ps

 abase.Ps (Xisa:jax.Array)

Compute stationary state distribution Ps, given joint strategy Xisa.

	Type	Details
Xisa	Array	Joint strategy
Returns	Array	Stationary state distribution

Ps uses the compute_stationarydistribution function.

from pyCRLD.Environments.EcologicalPublicGood import EcologicalPublicGood as EPG
from pyCRLD.Agents.StrategyActorCritic import stratAC

env = EPG(N=2, f=1.2, c=5, m=-5, qc=0.2, qr=0.01, degraded_choice=False)
MAEi = stratAC(env=env, learning_rates=0.1, discount_factors=0.99, use_prefactor=True)

x = MAEi.random_softmax_strategy()
MAEi._numpyPs(x)

array([0.91309416, 0.08690587], dtype=float32)

MAEi.Ps(x)

Array([0.91309416, 0.08690587], dtype=float32)

source

abase.Ri

 abase.Ri (Xisa:jax.Array)

Compute average reward Ri, given joint strategy Xisa.

	Type	Details
Xisa	Array	Joint strategy `Xisa`
Returns	Array	Average reward `Ri`

MAEi.Ri(x)

Array([-4.6322937, -4.5121984], dtype=float32)

source

abase.trajectory

 abase.trajectory (Xinit:jax.Array, Tmax:int=100, tolerance:float=None,
                   verbose=False, **kwargs)

Compute a joint learning trajectory.

	Type	Default	Details
Xinit	Array		Initial condition
Tmax	int	100	the maximum number of iteration steps
tolerance	float	None	to determine if a fix point is reached
verbose	bool	False	Say something during computation?
kwargs
Returns	tuple		(`trajectory`, `fixpointreached`)

trajectory is an Array containing the time-evolution of the dynamic variable. fixpointreached is a bool saying whether or not a fixed point has been reached.

source

abase._OtherAgentsActionsSummationTensor

 abase._OtherAgentsActionsSummationTensor ()

To sum over the other agents and their respective actions using einsum.

To obtain the strategy-average reward-prediction error for agent \(i\), we need to average out the probabilities contained in the strategies of all other agents \(j \neq i\) and the transition function \(T\),

\[ \sum_{a^j} \sum_{s'} \prod_{i\neq j} X^j(s, a^j) T(s, \mathbf a, s'). \]

The _OtherAgentsActionsSummationTensor enables this summation to be exectued in the efficient einsum function. It contains only \(0\)s and \(1\)s and is of dimension

\[ N \times \underbrace{N \times ... \times N}_{(N-1) \text{ times}} \times M \times \underbrace{M \times ... \times M}_{N \text{ times}} \times \underbrace{M \times ... \times M}_{(N-1) \text{ times}} \]

which represent

\[ \overbrace{N}^{\text{the focal agent}} \times \overbrace{\underbrace{N \times ... \times N}_{(N-1) \text{ times}}}^\text{all other agents} \times \overbrace{M}^\text{focal agent's action} \times \overbrace{\underbrace{M \times ... \times M}_{N \text{ times}}}^\text{all actions} \times \overbrace{\underbrace{M \times ... \times M}_{(N-1) \text{ times}}}^\text{all other agents' actions} \]

It contains a \(1\) only if

all agent indices (comprised of the focal agent index and all other agents indices) are different from each other
and the focal agent’s action index matches the focal agents’ action index in all actions
and if all other agents’ action indices match their corresponding action indices in all actions.

Otherwise it contains a \(0\).