Strategy Base
Base class containing the core methods of CRLD agents in strategy space
strategybase
strategybase (env, learning_rates:Union[float,Iterable], discount_factors:Union[float,Iterable], choice_intensities:Union[float,Iterable]=1.0, use_prefactor=False, opteinsum=True, **kwargs)
Base class for deterministic strategy-average independent (multi-agent) temporal-difference reinforcement learning in strategy space.
Type | Default | Details | |
---|---|---|---|
env | An environment object | ||
learning_rates | Union | agents’ learning rates | |
discount_factors | Union | agents’ discount factors | |
choice_intensities | Union | 1.0 | agents’ choice intensities |
use_prefactor | bool | False | use the 1-DiscountFactor prefactor |
opteinsum | bool | True | optimize einsum functions |
kwargs |
Further optional paramerater inherting from abase
:
Type | Default | Details | |
---|---|---|---|
use_prefactor | bool | False | use the 1-DiscountFactor prefactor |
opteinsum | bool | True | optimize einsum functions |
strategybase.step
strategybase.step (Xisa)
Performs a learning step along the reward-prediction/temporal-difference error in strategy space, given joint strategy Xisa
.
Type | Details | |
---|---|---|
Xisa | Joint strategy | |
Returns | tuple | (Updated joint strategy, Prediction error) |
strategybase.reverse_step
strategybase.reverse_step (Xisa)
*Performs a reverse learning step in strategy space, given joint strategy Xisa
.
This is useful to compute the separatrix of a multistable regime.*
Type | Details | |
---|---|---|
Xisa | Joint strategy | |
Returns | tuple | (Updated joint strategy, Prediction error) |
strategybase.zero_intelligence_strategy
strategybase.zero_intelligence_strategy ()
Returns strategy Xisa
with equal action probabilities.
strategybase.random_softmax_strategy
strategybase.random_softmax_strategy ()
Returns softmax strategy Xisa
with random action probabilities.
strategybase.id
strategybase.id ()
Returns an identifier to handle simulation runs.