Github repo: https://github.com/araffin/rl-tutorial-jnrr19
https://github.com/araffin/rl-tutorial-jnrr19
Source: Deep Mimic (Jason Peng)
Source: Outsider Tour RL by Ben Recht
Source: David Silver Course
Source: Lilian Weng blog
Credit: L.M Tenkes
Reinforcement Learning | Classical Control | |
---|---|---|
State | $s_t$ | $x_t$ |
Action | $a_t$ | $u_t$ |
Reward | $r_t$ | $-c_t$ |
An RL algo may include one or more of these components:
Source: BAIR blog
Exploration: Try a new beer
Exploitation: Drink your favorite beer
Exploration: gather more information about the environment
Exploitation: use the best known strategy to maximize reward
Markov: the current state depends only on the previous step, not the complete history
Fully Observable: agent directly observe the environment state ($o_t = s_t$) Ex: Chess vs Poker
HalfCheetahBulletEnv-v0:
env_wrapper: utils.wrappers.TimeFeatureWrapper
n_timesteps: !!float 2e6
policy: 'MlpPolicy'
gamma: 0.99
buffer_size: 1000000
noise_type: 'normal'
noise_std: 0.1
learning_starts: 10000
batch_size: 100
learning_rate: !!float 1e-3
train_freq: 1000
gradient_steps: 1000
policy_kwargs: 'dict(layers=[400, 300])'
python train.py --algo td3 --env HalfCheetahBulletEnv-v0
python enjoy.py --algo td3 --env HalfCheetahBulletEnv-v0
python -m utils.record_video --algo td3 --env HalfCheetahBulletEnv-v0 -n 1000
python train.py --algo ppo2 --env MountainCar-v0 \
--optimize --n-trials 1000 --n-jobs 2 \
--sampler tpe --pruner median