# We will use the parameters from POLD on github, this function is
def random_behavior(selfi, method="norm"):
"""Behavior profile with random probabilities."""
if method=="norm":
= np.random.rand(selfi.N, selfi.Q, selfi.M)
X = X / X.sum(axis=2).repeat(selfi.M).reshape(selfi.N, selfi.Q,
X
selfi.M)elif method == "diff":
= np.random.rand(selfi.N, selfi.Q, selfi.M-1)
X = np.concatenate((np.zeros((selfi.N, selfi.Q, 1)),
X =-1),
np.sort(X, axis1))), axis=-1)
np.ones((selfi.N, selfi.Q, = X[:, :, 1:] - X[:, :, :-1]
X return X
def random_reward(env,test):
= np.array(random_behavior(env))
X = test.trajectory(X)
xtraj, fixedpointreached = test.Ps(X)
States = test.Rio(xtraj[-1])[0]
Rewards = len(States)
n = sum([ States[k]*Rewards[k] for k in range(n)])
reward return reward
Renewable Ressources
Class for environment with Renewable Ressources
Implementation
RenewableRessources
RenewableRessources (r, C, pR=0.1, obs=None, deltaE=0.2, sig=1.0)
Environment with Renewable Ressources.
RenewableRessources.actions
RenewableRessources.actions ()
Default action set representations act_im
.
RenewableRessources.states
RenewableRessources.states ()
Default state set representation state_s
.
RenewableRessources.obs_action_space
RenewableRessources.obs_action_space ()
RenewableRessources.TransitionTensor
RenewableRessources.TransitionTensor ()
Get the Transition Tensor.
The TransitionTensor
is obtained with the help of the _transition_probability
method.
RenewableRessources._transition_probability
RenewableRessources._transition_probability (s, jA, sprim)
RenewableRessources.RewardTensor
RenewableRessources.RewardTensor ()
Get the Reward Tensor R[i,s,a1,…,aN,s’].
RenewableRessources.ObservationTensor
RenewableRessources.ObservationTensor ()
Default observation tensor: perfect observation
The RewardTensor
is obtained with the help of the _reward
method.
RenewableRessources._reward
RenewableRessources._reward (i, s, jA, sprim)
RenewableRessources.id
RenewableRessources.id ()
Returns id string of environment TODO
Example
We will show the effect witnessed in the article: limited information can lead to better strategies.
Same environment with different observability for agents
In the first environment, obs = None
is a shortcut to say that all environment states are observable clearly. In the two others the observations are specified. We can see that limited observation can lead to better reward.
= RenewableRessources(r=0.8, C=8, pR=0.1, obs=None, deltaE=0.2, sig=0.5)
env = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
test = []
L for k in range(100):
L.append(random_reward(env,test))print(np.mean(L))
1.2267892
= RenewableRessources(r=0.8, C=8, pR=0.1, obs=[[0,1],[2,3,4],[5],[6],[7]], deltaE=0.2, sig=0.5)
env = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
test = []
L for k in range(100):
L.append(random_reward(env,test))print(np.mean(L))
1.2735934
= RenewableRessources(r=0.8, C=8, pR=0.1, obs=[[0,1,2,3,4],[5,6,7]], deltaE=0.2, sig=0.5)
env = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
test = []
L for k in range(100):
L.append(random_reward(env,test))print(np.mean(L))
0.30748004