Renewable Ressources

Class for environment with Renewable Ressources

Implementation


source

RenewableRessources

 RenewableRessources (r, C, pR=0.1, obs=None, deltaE=0.2, sig=1.0)

Environment with Renewable Ressources.


source

RenewableRessources.actions

 RenewableRessources.actions ()

Default action set representations act_im.


source

RenewableRessources.states

 RenewableRessources.states ()

Default state set representation state_s.


source

RenewableRessources.obs_action_space

 RenewableRessources.obs_action_space ()

source

RenewableRessources.TransitionTensor

 RenewableRessources.TransitionTensor ()

Get the Transition Tensor.

The TransitionTensor is obtained with the help of the _transition_probability method.


source

RenewableRessources._transition_probability

 RenewableRessources._transition_probability (s, jA, sprim)

source

RenewableRessources.RewardTensor

 RenewableRessources.RewardTensor ()

Get the Reward Tensor R[i,s,a1,…,aN,s’].


source

RenewableRessources.ObservationTensor

 RenewableRessources.ObservationTensor ()

Default observation tensor: perfect observation

The RewardTensor is obtained with the help of the _reward method.


source

RenewableRessources._reward

 RenewableRessources._reward (i, s, jA, sprim)

source

RenewableRessources.id

 RenewableRessources.id ()

Returns id string of environment TODO

Example

We will show the effect witnessed in the article: limited information can lead to better strategies.

# We will use the parameters from POLD on github, this function is 

def random_behavior(selfi, method="norm"):
    """Behavior profile with random probabilities."""
    if method=="norm":
        X = np.random.rand(selfi.N, selfi.Q, selfi.M)
        X = X / X.sum(axis=2).repeat(selfi.M).reshape(selfi.N, selfi.Q,
                                                     selfi.M)
    elif method == "diff":
        X = np.random.rand(selfi.N, selfi.Q, selfi.M-1)
        X = np.concatenate((np.zeros((selfi.N, selfi.Q, 1)),
                            np.sort(X, axis=-1),
                            np.ones((selfi.N, selfi.Q, 1))), axis=-1)
        X = X[:, :, 1:] - X[:, :, :-1]
    return X
def random_reward(env,test):
    X = np.array(random_behavior(env))
    xtraj, fixedpointreached = test.trajectory(X)
    States = test.Ps(X)
    Rewards = test.Rio(xtraj[-1])[0]
    n = len(States)
    reward = sum([ States[k]*Rewards[k] for k in range(n)])
    return reward

Same environment with different observability for agents

In the first environment, obs = None is a shortcut to say that all environment states are observable clearly. In the two others the observations are specified. We can see that limited observation can lead to better reward.

env = RenewableRessources(r=0.8, C=8, pR=0.1, obs=None, deltaE=0.2, sig=0.5)
test = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
L = [] 
for k in range(100):
    L.append(random_reward(env,test))
print(np.mean(L))
1.2267892
env = RenewableRessources(r=0.8, C=8, pR=0.1, obs=[[0,1],[2,3,4],[5],[6],[7]], deltaE=0.2, sig=0.5)
test = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
L = [] 
for k in range(100):
    L.append(random_reward(env,test))
print(np.mean(L))
1.2735934
env = RenewableRessources(r=0.8, C=8, pR=0.1, obs=[[0,1,2,3,4],[5,6,7]], deltaE=0.2, sig=0.5)
test = POstratAC(env=env, learning_rates=0.02, discount_factors=0.9, choice_intensities= 250)
L = [] 
for k in range(100):
    L.append(random_reward(env,test))
print(np.mean(L))
0.30748004