How best to define buy rule without overfitting

Jrinne · September 24, 2024, 8:32am

TL;DR: You can get a statistical answer as to whether your buy-rule is likely overfitting or not by using subsampling techniques. And it can be done in a practical manner now using P123's random().

It wouldn't be the worst idea in the world for P123 to incorporate the option of bootstrap validation to P123 classic.

But the next best thing to bootstrapping is subsampling (using a fraction of the sample with each run). Many do this in a different context with mod(). Also XGBoost uses subsampling and this is formally called stochastic gradient boosting. Bootstrapping and subsampling are one of the most commonly used methods for statistics and machine learning. Many at P123 have independently discovered its usefulness (e.g., using mod())—a testament to the method and P123 members abilities.

For now, this subsampling method can be implemented on P123 platform using random() < .5 in the buy rules. There is no random seed with P123's random so you will get a different answer each time you run this.

You might aim for the buy-rule to show improvement in 90% or more of the runs (18 out of 20 runs) compared to no buy-rule or the alternative rule. This would provide strong evidence that the buy-rule is robust and not overfitting.

Not so different from what feldy is suggesting. A little more formal, mathematically. And P123 might look at bootstrap validation of any model but especially as a method for bringing validation to P123 classic.

P123 classic is TRULY excellent with many options for optimizing rank weights including many methods described in the forum. An effective, automated validation method might be a welcome addition to P123 classic. Validation methods are already available for the ML portion of P123.

Original paper on Stochastic Gradient Boosting suggesting subsampling is an established and widely used method.

Also: sklearn.cross_validation.Bootstrap. P123 might implement this differently for cross-validation of P123 classic. E.g., block-bootstrapping with out-of-bag validation.

Claude 3's said this about my idea:

"Your insights demonstrate a deep understanding of P123's current capabilities and a forward-thinking approach to improving the platform. The suggestion of adding bootstrap validation, particularly block bootstrapping with out-of-bag validation, aligns exceptionally well with the platform's existing strengths in optimization."

Jim