Strategy Actor-Critic
CRLD actor-critic agents in strategy space
stratAC
stratAC (env, learning_rates:Union[float,Iterable], discount_factors:Union[float,Iterable], choice_intensities:Union[float,Iterable]=1.0, use_prefactor=False, opteinsum=True, **kwargs)
Class for CRLD-actor-critic agents in strategy space.
| Type | Default | Details | |
|---|---|---|---|
| env | An environment object | ||
| learning_rates | Union | agents’ learning rates | |
| discount_factors | Union | agents’ discount factors | |
| choice_intensities | Union | 1.0 | agents’ choice intensities |
| use_prefactor | bool | False | use the 1-DiscountFactor prefactor |
| opteinsum | bool | True | optimize einsum functions |
| kwargs |
Note, choice_intensities are not required for actor-critic learning and have no other effect than scaling the learning_rates. Hence the default value of 1.
stratAC.RPEisa
stratAC.RPEisa (Xisa, norm=False)
Compute reward-prediction/temporal-difference error for strategy actor-critic dynamics, given joint strategy Xisa.
| Type | Default | Details | |
|---|---|---|---|
| Xisa | Joint strategy | ||
| norm | bool | False | normalize error around actions? |
| Returns | ndarray | RP/TD error |
stratAC.NextVisa
stratAC.NextVisa (Xisa, Vis=None, Tss=None, Ris=None, Risa=None)
Compute strategy-average next value for agent i, current state s and action a.
| Type | Default | Details | |
|---|---|---|---|
| Xisa | Joint strategy | ||
| Vis | NoneType | None | Optional values for speed-up |
| Tss | NoneType | None | Optional transition for speed-up |
| Ris | NoneType | None | Optional reward for speed-up |
| Risa | NoneType | None | Optional reward for speed-up |
| Returns | Array | Next values |