Strategy AC (part. Obs.)
CRLD actor-critic agents learning under partial observability in strategy space
POstratAC
POstratAC (env, learning_rates, discount_factors, choice_intensities=1, **kwargs)
Class for deterministic policy-average independent (multi-agent) partially observable temporal-difference actor-critic reinforcement learning in policy space.
POstratAC.RPEioa
POstratAC.RPEioa (X, norm=False)
TD error for partially observable policy AC dynamics, given joint policy X
POstratAC.NextVioa
POstratAC.NextVioa (X, Xisa=None, Bios=None, Vio=None, Tioo=None, Rio=None, Rioa=None)
Policy-average next value for agent i, current obs o and act a.