Base (part. Obs.)

Base class containing the core methods of CRLD agents learning under partial observability

source

aPObase

 aPObase (TransitionTensor, RewardTensor, ObservationTensor,
          DiscountFactors, use_prefactor=False, opteinsum=True, **kwargs)

*Base class for deterministic policy-average (/ memory mean field) independent (multi-agent) temporal-difference reinforcement learning with partial observability.

To be used as a base for both, value and policy dynamics.*

Strategy Averaging

Core methods to compute the strategy-average reward-prediction error


source

aPObase.Xisa

 aPObase.Xisa (X)

Compute state-action policy given the current observation-action policy


source

aPObase.Tss

 aPObase.Tss (X)

Compute average transition model Tss given policy X


source

aPObase.Bios

 aPObase.Bios (X)

Compute ‘belief’ that environment is in stats s given agent i observes observation o (Bayes Rule)


source

aPObase.Tioo

 aPObase.Tioo (X, Bios=None, Xisa=None)

Compute average transition model Tioo, given joint policy X


source

aPObase.Tioao

 aPObase.Tioao (X, Bios=None, Xisa=None)

Compute average transition model Tioao, given joint policy X


source

aPObase.Rioa

 aPObase.Rioa (X, Bios=None, Xisa=None)

Compute average reward Riosa, given joint policy X


source

aPObase.Rio

 aPObase.Rio (X, Bios=None, Xisa=None, Rioa=None)

Compute average reward Rio, given joint policy X


source

aPObase.Vio

 aPObase.Vio (X, Rio=None, Tioo=None, Bios=None, Xisa=None, Rioa=None,
              gamma=None)

Compute average observation values Vio, given joint policy X


source

aPObase.Qioa

 aPObase.Qioa (X, Rioa=None, Vio=None, Tioao=None, Bios=None, Xisa=None,
               gamma=None)

source

aPObase.Ri

 aPObase.Ri (X)

Compute average reward Ri, given joint policy X

#show_doc(aPObase.obsdist)

source

aPObase.Tisas

 aPObase.Tisas (X)

Compute average transition model Tisas, given joint policy X


source

aPObase.Risa

 aPObase.Risa (X)

Compute average reward Risa, given joint policy X


source

aPObase.Ris

 aPObase.Ris (X, Risa=None)

Compute average reward Ris, given joint policy X


source

aPObase.Vis

 aPObase.Vis (X, Ris=None, Tss=None, Risa=None)

Compute average state values Vis, given joint policy X


source

aPObase.Qisa

 aPObase.Qisa (X, Risa=None, Vis=None, Tisas=None)

Compute average state-action values Qisa, given joint policy X