The 37 Implementation Details of Proximal Policy Optimization

Shengyi Huang, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, Weixun Wang

April 2022

PDF Code Video

Abstract

Proximal policy optimization (PPO) has become one of the most popular deep reinforcement learning (DRL) algorithms. Yet, reproducing the PPO’s results has been challenging in the community. While recent works conducted ablation studies to provide insight on PPO’s implementation details, these works are not structured as tutorials and only focus on details concerning robotics tasks. As a result, reproducing PPO from scratch can become a daunting experience. Instead of introducing additional improvements, or doing further ablation studies, this blog post takes a step back and focuses on delivering a thorough reproduction of PPO in all accounts, as well as aggregating, documenting, and cataloging its most salient implementation details. This blog post also points out software engineering challenges in PPO and further efficiency improvement via the accelerated vectorized environments. With these, we believe this blog post will help people understand PPO faster and better, facilitating customization and research upon this versatile RL algorithm.

Type

Conference paper

Publication

10th International Conference on Learning Representations

Reinforcement Learning, Robotics

Antonin Raffin

Research Engineer in Robotics and Machine Learning

Robots. Machine Learning. Blues Dance.

The 37 Implementation Details of Proximal Policy Optimization

Abstract

Antonin Raffin

Research Engineer in Robotics and Machine Learning

Related