Value Base
Strategy Functions
First, we define classes for different stragegy functions which are necessary for value-based agents. Then, we define the base class for value-based agents.
multiagent_epsilongreedy_strategy
multiagent_epsilongreedy_strategy (epsilon_greedys=None, N=None)
A multiagent epsilon-greedy strategy in tabular form
action_probabilities
action_probabilities (Qisa)
Transform Q values into epsilongreedy policy
multiagent_epsilongreedy_strategy.id
multiagent_epsilongreedy_strategy.id ()
Returns an identifier to handle simulation runs.
Value Base Class
Now we define the base clase for the value-based CRLD agents.
valuebase
valuebase (env, learning_rates:Union[float,Iterable], discount_factors:Union[float,Iterable], strategy_function, choice_intensities:Union[float,Iterable]=1.0, use_prefactor=False, opteinsum=True, **kwargs)
Base class for deterministic strategy-average independent (multi-agent) reward-prediction temporal-difference reinforcement learning in value space.
Type | Default | Details | |
---|---|---|---|
env | An environment object | ||
learning_rates | Union | agents’ learning rates | |
discount_factors | Union | agents’ discount factors | |
strategy_function | the strategy function object | ||
choice_intensities | Union | 1.0 | agents’ choice intensities |
use_prefactor | bool | False | use the 1-DiscountFactor prefactor |
opteinsum | bool | True | optimize einsum functions |
kwargs |
step
step (Qisa)
Temporal-difference reward-prediction learning step in value space, given joint state-action values Qisa
.
Details | |
---|---|
Qisa | joint state-action values |
valuebase.zero_intelligence_values
valuebase.zero_intelligence_values (value:float=0.0)
*Zero-intelligence causes a behavior where agents choose each action with equal probability.
This function returns the state-action values for the zero-intelligence strategy with each state-action value set to value
.*
Type | Default | Details | |
---|---|---|---|
value | float | 0.0 | state-action value |
valuebase.random_values
valuebase.random_values ()
Returns normally distributed random state-action values.
id
id ()
Returns an identifier to handle simulation runs.