You’ve shown us that the top 10 out of 1,389 models that have one years’ worth of OOS returns performed brilliantly.
A more meaningful result is that the median out of those 1,389 models returned 136.5%.
How many models are not on the leaderboard because they’re not “staked” or were pulled after poor OOS performance? Do these 1,389 models constitute the whole of all the models created or only those that met certain performance criteria?