Value Base

Base class containing the core methods of CRLD agents in value space

Strategy Functions

First, we define classes for different stragegy functions which are necessary for value-based agents. Then, we define the base class for value-based agents.


source

multiagent_epsilongreedy_strategy

 multiagent_epsilongreedy_strategy (epsilon_greedys=None, N=None)

A multiagent epsilon-greedy strategy in tabular form


source

action_probabilities

 action_probabilities (Qisa)

Transform Q values into epsilongreedy policy


source

multiagent_epsilongreedy_strategy.id

 multiagent_epsilongreedy_strategy.id ()

Returns an identifier to handle simulation runs.

Value Base Class

Now we define the base clase for the value-based CRLD agents.


source

valuebase

 valuebase (env, learning_rates:Union[float,Iterable],
            discount_factors:Union[float,Iterable], strategy_function,
            choice_intensities:Union[float,Iterable]=1.0,
            use_prefactor=False, opteinsum=True, **kwargs)

Base class for deterministic strategy-average independent (multi-agent) reward-prediction temporal-difference reinforcement learning in value space.

Type Default Details
env An environment object
learning_rates Union agents’ learning rates
discount_factors Union agents’ discount factors
strategy_function the strategy function object
choice_intensities Union 1.0 agents’ choice intensities
use_prefactor bool False use the 1-DiscountFactor prefactor
opteinsum bool True optimize einsum functions
kwargs

source

step

 step (Qisa)

Temporal-difference reward-prediction learning step in value space, given joint state-action values Qisa.

Details
Qisa joint state-action values

source

valuebase.zero_intelligence_values

 valuebase.zero_intelligence_values (value:float=0.0)

*Zero-intelligence causes a behavior where agents choose each action with equal probability.

This function returns the state-action values for the zero-intelligence strategy with each state-action value set to value.*

Type Default Details
value float 0.0 state-action value

source

valuebase.random_values

 valuebase.random_values ()

Returns normally distributed random state-action values.


source

id

 id ()

Returns an identifier to handle simulation runs.