The only difference: the epsilon value to avoid division by zero in the optimizer
(one is eps=1e-7
the other eps=1e-5
)
Credits: Rishabh Mehrotra (@erishabh)
from gym import spaces
# Unnormalized action spaces only work with algorithms
# that don't directly rely on a Gaussian distribution to define the policy
# (e.g. DDPG or SAC, where their output is rescaled to fit the action space limits)
# LIMITS TOO BIG: in that case, the sampled actions will only have values
# around zero, far away from the limits of the space
action_space = spaces.Box(low=-1000, high=1000, shape=(n_actions,), dtype="float32")
# LIMITS TOO SMALL: in that case, the sampled actions will almost
# always saturate (be greater than the limits)
action_space = spaces.Box(low=-0.02, high=0.02, shape=(n_actions,), dtype="float32")
# BEST PRACTICE: action space is normalized, symmetric
# and has an interval range of two,
# which is usually the same magnitude as the initial standard deviation
# of the Gaussian used to sample actions (unit initial std in SB3)
action_space = spaces.Box(low=-1, high=1, shape=(n_actions,), dtype="float32")
Credits: Nathan Lambert (@natolambert)
Observation Space |
tendon forces, desired pose, current pose |
---|---|
Action Space | desired forces (4D) |
Reward Function |
distance to target / continuity |
Terminations | success / timeout |
Algorithm | SAC + gSDE |
Observation Space |
latent vector / current speed + history |
---|---|
Action Space | steering angle / throttle |
Reward Function |
speed + smoothness |
Terminations | crash / timeout |
Algorithm | SAC / TQC + gSDE |
Observation Space |
joints positons / torques / imu / gyro + history |
---|---|
Action Space | motor positions (6D) |
Reward Function |
forward distance / walk straight / continuity |
Terminations | fall / timeout |
Algorithm | TQC + gSDE |
Notebook repo: https://github.com/araffin/rl-handson-rlvs21