Value Base

Base class containing the core methods of CRLD agents in value space

Strategy Functions

First, we define classes for different stragegy functions which are necessary for value-based agents. Then, we define the base class for value-based agents.

source

multiagent_epsilongreedy_strategy

 multiagent_epsilongreedy_strategy (epsilon_greedys=None, N=None)

A multiagent epsilon-greedy strategy in tabular form

source

action_probabilities

 action_probabilities (Qisa)

Transform Q values into epsilongreedy policy

source

multiagent_epsilongreedy_strategy.id

 multiagent_epsilongreedy_strategy.id ()

Returns an identifier to handle simulation runs.

Value Base Class

Now we define the base clase for the value-based CRLD agents.

source

valuebase

 valuebase (env, learning_rates:Union[float,Iterable],
            discount_factors:Union[float,Iterable], strategy_function,
            choice_intensities:Union[float,Iterable]=1.0,
            use_prefactor=False, opteinsum=True, **kwargs)

Base class for deterministic strategy-average independent (multi-agent) reward-prediction temporal-difference reinforcement learning in value space.

	Type	Default	Details
env			An environment object
learning_rates	Union		agents’ learning rates
discount_factors	Union		agents’ discount factors
strategy_function			the strategy function object
choice_intensities	Union	1.0	agents’ choice intensities
use_prefactor	bool	False	use the 1-DiscountFactor prefactor
opteinsum	bool	True	optimize einsum functions
kwargs

source

step

 step (Qisa)

Temporal-difference reward-prediction learning step in value space, given joint state-action values Qisa.

	Details
Qisa	joint state-action values

source

valuebase.zero_intelligence_values

 valuebase.zero_intelligence_values (value:float=0.0)

*Zero-intelligence causes a behavior where agents choose each action with equal probability.

This function returns the state-action values for the zero-intelligence strategy with each state-action value set to value.*

	Type	Default	Details
value	float	0.0	state-action value

source

valuebase.random_values

 valuebase.random_values ()

Returns normally distributed random state-action values.

source

id

 id ()

Returns an identifier to handle simulation runs.