Make Rank Performance Tests Use Equal Bucket Sizes via Jittering

Jrinne · June 17, 2025, 11:03am

Hi P123 Team and Members,

Killer app for this:
→ Smooth rank performance test results with an equal number of stocks in each bucket (achievable with mild additional programming that I will not cover ).

I’d like to propose adding a random_state parameter to P123’s random() function. This small enhancement could support multiple valuable use cases:

Consistent tie-breaking across model runs (when desired)
Controlled randomness for reproducible random subsets (e.g., replacing current uses of Mod())
Debugging and backtesting repeatability, especially in workflows with random elements
More consistent simulation results — for example, it could help address variability in cases like this: Backtest Consistency: Different Results from Same Test
Cleaner performance testing with equal-sized rank buckets (achievable with mild additional programming that I will not cover here)

This feature is widely adopted in most machine learning libraries, including scikit-learn’s RandomForestRegressor, because it gives users control over randomness when needed—or consistency when preferred.

In practice, this could be implemented as:
Random(seed =123)

…or similar, with the default behavior remaining unchanged for backward compatibility.

Given the relatively small development effort and wide applicability, I believe this would be a high-impact addition for many users.

Thanks for considering it!

benhorvath · June 17, 2025, 4:42pm

Seconded.

Jrinne · June 17, 2025, 10:03pm

Here’s how jittering—or adding a small amount of randomization—can dramatically improve the clarity of a rank performance test by ensuring the same number of stocks in each bucket.

Without jittering (uneven bucket sizes):

With a small amount of jittering/randomization (equal-sized buckets):

This can be done now using non-deterministic methods. But it would be much better if P123 increased internal precision (more decimal places)—so that applying a very small jitter preserves the true rank order while enabling smoother, more interpretable quantile results.

More accurate. More consistent. A more usable slope for anyone running rank performance tests.

And just as important, the top bucket contains the same number of stocks across all features—enabling clearer, more consistent feature comparisons.

Rank performance tests are a core strength of P123. This fix should be relatively simple to implement.

Now showing the effect with 30 buckets:

No jittering:

With jittering (equal bucket sizes):

Conclusion: Visually, this one’s easy to see—no words should be necessary.