#show_doc(aPObase.obsdist)
Base (part. Obs.)
aPObase
aPObase (TransitionTensor, RewardTensor, ObservationTensor, DiscountFactors, use_prefactor=False, opteinsum=True, **kwargs)
*Base class for deterministic policy-average (/ memory mean field) independent (multi-agent) temporal-difference reinforcement learning with partial observability.
To be used as a base for both, value and policy dynamics.*
Strategy Averaging
Core methods to compute the strategy-average reward-prediction error
aPObase.Xisa
aPObase.Xisa (X)
Compute state-action policy given the current observation-action policy
aPObase.Tss
aPObase.Tss (X)
Compute average transition model Tss given policy X
aPObase.Bios
aPObase.Bios (X)
Compute ‘belief’ that environment is in stats s given agent i observes observation o (Bayes Rule)
aPObase.Tioo
aPObase.Tioo (X, Bios=None, Xisa=None)
Compute average transition model Tioo, given joint policy X
aPObase.Tioao
aPObase.Tioao (X, Bios=None, Xisa=None)
Compute average transition model Tioao, given joint policy X
aPObase.Rioa
aPObase.Rioa (X, Bios=None, Xisa=None)
Compute average reward Riosa, given joint policy X
aPObase.Rio
aPObase.Rio (X, Bios=None, Xisa=None, Rioa=None)
Compute average reward Rio, given joint policy X
aPObase.Vio
aPObase.Vio (X, Rio=None, Tioo=None, Bios=None, Xisa=None, Rioa=None, gamma=None)
Compute average observation values Vio, given joint policy X
aPObase.Qioa
aPObase.Qioa (X, Rioa=None, Vio=None, Tioao=None, Bios=None, Xisa=None, gamma=None)
aPObase.Ri
aPObase.Ri (X)
Compute average reward Ri, given joint policy X
aPObase.Tisas
aPObase.Tisas (X)
Compute average transition model Tisas, given joint policy X
aPObase.Risa
aPObase.Risa (X)
Compute average reward Risa, given joint policy X
aPObase.Ris
aPObase.Ris (X, Risa=None)
Compute average reward Ris, given joint policy X
aPObase.Vis
aPObase.Vis (X, Ris=None, Tss=None, Risa=None)
Compute average state values Vis, given joint policy X
aPObase.Qisa
aPObase.Qisa (X, Risa=None, Vis=None, Tisas=None)
Compute average state-action values Qisa, given joint policy X