Strategy Actor-Critic

CRLD actor-critic agents in strategy space

stratAC

 stratAC (env, learning_rates:Union[float,Iterable],
          discount_factors:Union[float,Iterable],
          choice_intensities:Union[float,Iterable]=1.0,
          use_prefactor=False, opteinsum=True, **kwargs)

Class for CRLD-actor-critic agents in strategy space.

	Type	Default	Details
env			An environment object
learning_rates	Union		agents’ learning rates
discount_factors	Union		agents’ discount factors
choice_intensities	Union	1.0	agents’ choice intensities
use_prefactor	bool	False	use the 1-DiscountFactor prefactor
opteinsum	bool	True	optimize einsum functions
kwargs

Note, choice_intensities are not required for actor-critic learning and have no other effect than scaling the learning_rates. Hence the default value of 1.

source

stratAC.RPEisa

 stratAC.RPEisa (Xisa, norm=False)

Compute reward-prediction/temporal-difference error for strategy actor-critic dynamics, given joint strategy Xisa.

	Type	Default	Details
Xisa			Joint strategy
norm	bool	False	normalize error around actions?
Returns	ndarray		RP/TD error

source

stratAC.NextVisa

 stratAC.NextVisa (Xisa, Vis=None, Tss=None, Ris=None, Risa=None)

Compute strategy-average next value for agent i, current state s and action a.

	Type	Default	Details
Xisa			Joint strategy
Vis	NoneType	None	Optional values for speed-up
Tss	NoneType	None	Optional transition for speed-up
Ris	NoneType	None	Optional reward for speed-up
Risa	NoneType	None	Optional reward for speed-up
Returns	Array		Next values