Strategy AC (part. Obs.)

CRLD actor-critic agents learning under partial observability in strategy space

source

POstratAC

 POstratAC (env, learning_rates, discount_factors, choice_intensities=1,
            **kwargs)

Class for deterministic policy-average independent (multi-agent) partially observable temporal-difference actor-critic reinforcement learning in policy space.


source

POstratAC.RPEioa

 POstratAC.RPEioa (X, norm=False)

TD error for partially observable policy AC dynamics, given joint policy X


source

POstratAC.NextVioa

 POstratAC.NextVioa (X, Xisa=None, Bios=None, Vio=None, Tioo=None,
                     Rio=None, Rioa=None)

Policy-average next value for agent i, current obs o and act a.