Strategy Actor-Critic

CRLD actor-critic agents in strategy space

source

stratAC

 stratAC (env, learning_rates:Union[float,Iterable],
          discount_factors:Union[float,Iterable],
          choice_intensities:Union[float,Iterable]=1.0,
          use_prefactor=False, opteinsum=True, **kwargs)

Class for CRLD-actor-critic agents in strategy space.

Type Default Details
env An environment object
learning_rates Union agents’ learning rates
discount_factors Union agents’ discount factors
choice_intensities Union 1.0 agents’ choice intensities
use_prefactor bool False use the 1-DiscountFactor prefactor
opteinsum bool True optimize einsum functions
kwargs

Note, choice_intensities are not required for actor-critic learning and have no other effect than scaling the learning_rates. Hence the default value of 1.


source

stratAC.RPEisa

 stratAC.RPEisa (Xisa, norm=False)

Compute reward-prediction/temporal-difference error for strategy actor-critic dynamics, given joint strategy Xisa.

Type Default Details
Xisa Joint strategy
norm bool False normalize error around actions?
Returns ndarray RP/TD error

source

stratAC.NextVisa

 stratAC.NextVisa (Xisa, Vis=None, Tss=None, Ris=None, Risa=None)

Compute strategy-average next value for agent i, current state s and action a.

Type Default Details
Xisa Joint strategy
Vis NoneType None Optional values for speed-up
Tss NoneType None Optional transition for speed-up
Ris NoneType None Optional reward for speed-up
Risa NoneType None Optional reward for speed-up
Returns Array Next values