CROP-leaderboard
A standardized benchmark for certified robustness of RL algorithms

The goal of CROP-leaderboard is to systematically certify the robustness of different RL algorithms based on certification criteria such as per-state action and the lower bound of cumulative reward. The related paper can be found here.

Available Leaderboards
CartPole-v0
1
1
Leaderboard: CartPole-v0 (LoAct)

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at all time steps. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step t, and the y-axis is the certified radius rt. The shaded area represents the standard deviation.

1
1
Leaderboard: CartPole-v0 (GRe-mean)

Robustness certification as cumulative reward in terms of expection bound JE. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: CartPole-v0 (GRe-median)

Robustness certification as cumulative reward in terms of percentile bound JP (p = 50%). Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: CartPole-v0 (LoRe)

Robustness certification as cumulative reward in terms of absolute lower bound bound J. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
1
1
PongNoFrameskip-v4
1
1
Leaderboard: PongNoFrameskip-v4 (LoAct)

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 500. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step t, and the y-axis is the certified radius rt. The shaded area represents the standard deviation.

1
1
Leaderboard: PongNoFrameskip-v4 (GRe-mean)

Robustness certification as cumulative reward in terms of expection bound JE at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: PongNoFrameskip-v4 (GRe-median)

Robustness certification as cumulative reward in terms of percentile bound JP (p = 50%) at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: PongNoFrameskip-v4 (LoRe)

Robustness certification as cumulative reward in terms of absolute lower bound bound J at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
1
1
FreewayNoFrameskip-v4
1
1
Leaderboard: FreewayNoFrameskip-v4 (LoAct)

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 500. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step t, and the y-axis is the certified radius rt. The shaded area represents the standard deviation.

1
1
Leaderboard: FreewayNoFrameskip-v4 (GRe-mean)

Robustness certification as cumulative reward in terms of expection bound JE at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: FreewayNoFrameskip-v4 (GRe-median)

Robustness certification as cumulative reward in terms of percentile bound JP (p = 50%) at time steps = 500. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: FreewayNoFrameskip-v4 (LoRe)

Robustness certification as cumulative reward in terms of absolute lower bound bound J at time steps = 200. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
1
1
highway-fast-v0
1
1
Leaderboard: highway-fast-v0 (LoAct)

Robustness certiļ¬cation for per-state action in terms of certiļ¬ed radius r at time steps = 30. Each column corresponds to one smoothing variance σ and each row corresponds to one RL algorithm. For each figure, the x-axis is time step t, and the y-axis is the certified radius rt. The shaded area represents the standard deviation.

1
1
Leaderboard: highway-fast-v0 (GRe-mean)

Robustness certification as cumulative reward in terms of expection bound JE at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: highway-fast-v0 (GRe-median)

Robustness certification as cumulative reward in terms of percentile bound JP (p = 50%) at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.

1
1
Leaderboard: highway-fast-v0 (LoRe)

Robustness certification as cumulative reward in terms of absolute lower bound bound J at time steps = 30. Each column corresponds to one smoothing variance. Solid lines represent the certified reward bounds of different methods, and dashed lines show the empirical performance under PGD attack.