Avatar

Antonin Raffin

Research Engineer in Robotics and Machine Learning

German Aerospace Center (DLR)

Bio

Antonin Raffin is a research engineer at the German Aerospace Center (DLR) who specializes in reinforcement learning (RL). He is the lead developer of Stable-Baselines3 (SB3), an open-source library that implements Deep RL algorithms. His main focus is on learning controllers directly on real robots and improving the reproducibility of RL.

Interests

  • Robotics
  • Reinforcement Learning
  • State Representation Learning
  • Machine Learning

Projects

*

SBX: Stable Baselines Jax

Proof of concept version of Stable-Baselines3 in Jax.

Datasaurust

Blazingly fast implementation of the Datasaurus paper in Rust. Same Stats, Different Graphs.

Stable Baselines3

A set of improved implementations of reinforcement learning algorithms in PyTorch.

Learning to Drive Smoothly in Minutes

Learning to drive smoothly in minutes using reinforcement learning on a Donkey Car.

RL Baselines Zoo

A collection of 70+ pre-trained RL agents using Stable Baselines

S-RL Toolbox

S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics

Stable Baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Racing Robot

Autonomous Racing Robot With an Arduino, a Raspberry Pi and a Pi Camera

Arduino Robust Serial

A simple and robust serial communication protocol. Implementation in C Arduino, C++, Python and Rust.

Recent & Upcoming Talks

Recent Advances in RL for Continuous Control

A presentation on recent advances in RL, in terms of algorithms, software, and simulators.

Enabling Reinforcement Learning on Real Robots

Invited talk while visiting the INRIA Willow team in Paris.

Ingredients for Learning Locomotion Directly on Real Hardware

Invited talk for the Soccer Robots workshop at Humanoids conference 2024

Designing and Running Real-World RL Experiments

Talk at the Reinforcement Learning for Autonomous Accelerators workshop (RL4AA). The idea is to walk through the different steps of RL experimentation (task design, choosing the right algorithm, implementing safety layers) and also provide practical advice on how to run experiments and troubleshoot common problems.

Experience

 
 
 
 
 

Researcher

German Aerospace Center (DLR)

October 2018 – Present Munich
Machine Learning for Robots.
 
 
 
 
 

Research Engineer

ENSTA ParisTech - U2IS robotics lab

October 2017 – October 2018 Palaiseau
Working on Reinforcement Learning and State Representation Learning for the DREAM project.
 
 
 
 
 

Research Intern

Riminder

April 2017 – September 2017 Paris
Deep Learning for Human Resources.
 
 
 
 
 

Research Intern

TU Berlin - RBO lab

May 2016 – August 2016 Berlin
Research internship in representation and reinforcement learning.

Recent Posts

Automatic Hyperparameter Tuning - In Practice (Part 2)

This is the second (and last) post on automatic hyperparameter optimization. In the first part, I introduced the challenges and main components of hyperparameter tuning (samplers, pruners, objective function, …). This second part is about the practical application of this technique with the Optuna library, in a reinforcement learning setting (using the Stable-Baselines3 (SB3) library).

Getting SAC to Work on a Massive Parallel Simulator: An RL Journey With Off-Policy Algorithms (Part I)

This post details how I managed to get the Soft-Actor Critic (SAC) and other off-policy reinforcement learning algorithms to work on massively parallel simulators (think Isaac Sim with thousands of robots simulated in parallel).

Automatic Hyperparameter Tuning - A Visual Guide (Part 1)

When you’re building a machine learning model, you want to find the best hyperparameters to make it shine. But who has the luxury of trying out every possible combination? The good news is that automatic hyperparameter tuning can help you.

Rliable: Better Evaluation for Reinforcement Learning - A Visual Explanation

It is critical for Reinforcement Learning (RL) practitioners to properly evaluate and compare results. Reporting results with poor comparison leads to a progress mirage and may underestimate the stochasticity of the results.