A complex system that works is invariably found to have evolved from a simple system that worked.
Learning directly on real robots
In the papers...
...in reality.
Credit: ESA/NASA
After, with the 1kg arm
After, new arm position + magnet
truncation vs termination
Timeout: max_episode_steps=4
An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks
Up to 20x faster!
Stable-Baselines3 (PyTorch) vs SBX (Jax)
More gradient steps: 4x more sample efficient!
Also have a look at TQC, TD7 and CrossQ.
Using SB3 + Jax = SBX: https://github.com/araffin/sbx
Ex: Controlling tendons forces instead of motor positions
David (aka HASy)
German Aerospace Center (DLR)
python -m rl_zoo3.cli all_plots -a sac -e HalfCheetah Ant -f logs/ -o sac_results
python -m rl_zoo3.cli plot_from_file -i sac_results.pkl -latex -l SAC --rliable
# Train an SAC agent on Pendulum using tuned hyperparameters,
# evaluate the agent every 1k steps and save a checkpoint every 10k steps
# Pass custom hyperparams to the algo/env
python -m rl_zoo3.train --algo sac --env Pendulum-v1 --eval-freq 1000 \
--save-freq 10000 -params train_freq:2 --env-kwargs g:9.8
└── Pendulum-v1_1 # One folder per experiment
├── 0.monitor.csv # episodic return
├── best_model.zip # best model according to evaluation
├── evaluations.npz # evaluation results
├── Pendulum-v1
│ ├── args.yml # custom cli arguments
│ ├── config.yml # hyperparameters
│ └── vecnormalize.pkl # normalization
├── Pendulum-v1.zip # final model
└── rl_model_10000_steps.zip # checkpoint