obs_high = np.array(
[
self.off_track_threshold, # lateral error
np.inf, # lateral error derivative
],
dtype=np.float32,
)
self.observation_space = spaces.Box(low=-obs_high, high=obs_high)
# Later: [lateral_error, heading_error, forward_velocity,
# angular_velocity, left_wheel_speed, right_wheel_speed,
# curvature, lookahead_lat_2, lookahead_lat_4, lookahead_lat_6]
# Action: [left_wheel_speed, right_wheel_speed]
left_wheel_speed = base_speed + steering
right_wheel_speed = base_speed - steering
# 1-D action → steering ∈ [-1, 1]
self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(1,))
Stay close to the line while moving forward
# Note: always normalize!
lateral_penalty = -((lateral_error / self.off_track_threshold) ** 2)
alive_bonus = 1.0 # otherwise might try to terminate early
reward = alive_bonus + lateral_penalty + forward_velocity
What can be changed for racing?
Move forward and stay on the track
off_track = abs(lateral_error) > self.off_track_threshold
terminated = off_track or going_reverse
truncated = self.step_count >= self.max_episode_steps # timeout
Note: timeout/truncation needs special handling in the algorithm
if lateral_error > 0:
action = STEER_LEFT
else:
action = STEER_RIGHT
$a_t = \textcolor{#1864ab}{K_p} \textcolor{#a61e4d}{e_t} + \textcolor{#1864ab}{K_d} \textcolor{#a61e4d}{\frac{e_t - e_{t -1}}{\Delta t}}$
$a_t = \begin{bmatrix} \textcolor{#1864ab}{K_p} & \textcolor{#1864ab}{K_d} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{#a61e4d}{e_t} \\ \textcolor{#a61e4d}{\frac{e_t - e_{t -1}}{\Delta t}} \end{bmatrix}$
$a_t = \textcolor{#1864ab}{\theta}^\top \textcolor{#a61e4d}{s_t}$
$\pi_{\textcolor{#1864ab}{\theta}}(\textcolor{#a61e4d}{s_t}) = \textcolor{#1864ab}{\theta}^\top \textcolor{#a61e4d}{s_t}$ a linear policy!
$\pi_{\textcolor{#1864ab}{\theta}}(\textcolor{#a61e4d}{s_t}) = \begin{bmatrix} \textcolor{#1864ab}{K_p} & \textcolor{#1864ab}{K_d} \end{bmatrix}^\top \textcolor{#a61e4d}{s_t} = \textcolor{#1864ab}{\theta}^\top \textcolor{#a61e4d}{s_t}$
$\theta^* = \text{argmin}_{\theta}{J(\theta)}$
Episodic RL?