CROP-leaderboard

A standardized benchmark for certified robustness of RL algorithms

The goal of **CROP-leaderboard** is to systematically certify the robustness of different RL algorithms based on certification criteria such as per-state action and the lower bound of cumulative reward. The related paper can be found here.

1

1

Leaderboard: CartPole-v0 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at all time steps. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation.

1

1

Leaderboard: CartPole-v0 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard: CartPole-v0 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard: CartPole-v0 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J*. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

PongNoFrameskip-v4
1

1

1

1

1

Leaderboard:
PongNoFrameskip-v4 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 500. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation.

1

1

Leaderboard:
PongNoFrameskip-v4 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
PongNoFrameskip-v4 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
PongNoFrameskip-v4 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J* at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

FreewayNoFrameskip-v4
1

1

1

1

1

Leaderboard:
FreewayNoFrameskip-v4 (LoAct)
1

*t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (GRe-mean)
1

*expection bound* * J_{E}* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (GRe-median)
1

*percentile bound* * J_{P} (p = 50%)* at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
FreewayNoFrameskip-v4 (LoRe)
1

*absolute lower bound bound* * J* at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

1

1

highway-fast-v0
1

1

1

1

1

Leaderboard:
highway-fast-v0 (LoAct)
1

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 30. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step *t*, and the y-axis is the certified radius *r _{t}*. The shaded area represents the standard deviation.

1

1

Leaderboard:
highway-fast-v0 (GRe-mean)
1

Robustness certification as cumulative reward in terms of *expection bound* * J_{E}* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
highway-fast-v0 (GRe-median)
1

Robustness certification as cumulative reward in terms of *percentile bound* * J_{P} (p = 50%)* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1

1

Leaderboard:
highway-fast-v0 (LoRe)
1

Robustness certification as cumulative reward in terms of *absolute lower bound bound* * J* at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.