Strategy Base (part. Obs.)
POstrategybase
POstrategybase (env, learning_rates, discount_factors, choice_intensities=1, **kwargs)
Base Class for deterministic policy-average independent (multi-agent) partially observable temporal-difference reinforcement learning in policy space.
POstrategybase.random_softmax_policy
POstrategybase.random_softmax_policy ()
Softmax policy with random probabilities.
POstrategybase.zero_intelligence_policy
POstrategybase.zero_intelligence_policy ()
Policy with equal probabilities.