$a_t = \mu(s_t; \theta)$, $\quad$ deterministic controller
$a_t \sim \pi_\theta(a_t | s_t) = f(\mu, \textcolor{6741d9}{\epsilon}, \ldots)$, stochastic controller
$a_t = \textcolor{#1864ab}{K_p} \textcolor{#a61e4d}{e_t} + \textcolor{#1864ab}{K_d} \textcolor{#a61e4d}{\frac{e_t - e_{t -1}}{\Delta t}}$
$a_t = \begin{bmatrix} \textcolor{#1864ab}{K_p} & \textcolor{#1864ab}{K_d} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{#a61e4d}{e_t} \\ \textcolor{#a61e4d}{\frac{e_t - e_{t -1}}{\Delta t}} \end{bmatrix}$
$a_t = \textcolor{#1864ab}{\theta}^\top \textcolor{#a61e4d}{s_t}$
$\mu_{\textcolor{#1864ab}{\theta}}(\textcolor{#a61e4d}{s_t}) = \textcolor{#1864ab}{\theta}^\top \textcolor{#a61e4d}{s_t}$ a linear policy!
$a_t = \mu(s_t; \theta_{\mu} + \epsilon)$, $\quad \epsilon \sim \mathcal{N}(0, \sigma)$
$\epsilon$ is sampled once per episode
Example:
$\tilde{\theta} = \begin{bmatrix} K_p \\ K_d \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \end{bmatrix}$
\[ a_t = (\theta_{\mu} + \theta_{\epsilon})^{\top}s_t \]
Raffin, Antonin "Enabling Reinforcement Learning on Real Robots." Diss. TUM, 2024.
$ a_t = \mu(s_t; \theta_{\mu}) + \epsilon_t$, $\quad \epsilon_t \sim \mathcal{N}(0, \sigma)$
$\epsilon_t$ is sampled at every step
Antonin Raffin, Jens Kober & Freek Stulp. "Smooth exploration for robotic reinforcement learning." CoRL, 2022.
Rückstieß, T., et al. "State-dependent exploration for policy gradient methods." ECML, 2008.
$a_t = \mu(s_t) + \sigma(s_t) \varepsilon_t$ , $\quad \dot{\varepsilon_t} = \ldots$
Ex:
$
|\hat{\varepsilon}(f)|^{2} \propto f^{-\beta}, \quad \text{where} \quad \hat{\varepsilon}(f) = \mathcal{F}[\varepsilon(t)](f)
$
White Noise ($\beta = 0$), Pink Noise ($\beta = 1$), Red/Brownian Noise ($\beta = 2$)
Eberhard, Onno, et al. "Pink noise is all you need: Colored noise exploration in deep reinforcement learning." ICLR, 2023.

Padalkar, Abhishek, et al. "Towards safe and efficient learning in the wild: Guiding RL with constrained uncertainty-aware movement primitives." RA-L (2025).