Korr and all,
First, I really do think @marco is on the right track. He understands this and is working on letting us get started on this while working on VALIDATION. Awesomely cool!!! Full stop.
I say that because I do not want the below to be misinterpreted as being critical. Rather it is an expansion (I believe) on why Marco finds validation to be a meaningful topic.
Also, Korr has a point. One could look at the designer models.
So I don’t know if people are going to think this question is overly complex or too simple. Maybe both. But here it is: “If we are going to look at the designer models then what are they, exactly, in machine learning terms?”
I submit that they constitute a validation set. THE DESIGNER MODELS CAN BE CONSIDERED A VALIDATION SET. On topic as what I presented above is: A VALIDATION SET.
Also on topic in that I think Marco is looking (rightly so in my opinion) to provide VALIDATION SETS within P123’s ML/AI.
So just to continue on Korr’s point of how you would use P123’s designer models. Validation sets (which are what the out-of-sample results the designer models are I believe) are meant to be used to SELECT A MODEL.
That is what you will probably do with P123’s ML/AI. You will try what P123’s offers. For now regressions, support vector machines, random forest, XGboost and neural-nets. You will try all of them and everything else being equal (equally easy to implement, equally transparent or not so much of a black box, equally complex etc) you pick the model with the best validation set. Simple right?
Then in machine learning you may have a HOLD-OUT SET. That may mean paper-trading for some. Or maybe you trained 2000 - 2010, validated 2010 - 2017 and 2017 - now is a hold-out test set.
So this is probably clearer put in terms of the designer models. You have some out-of-sample results that I have called a validation set. If you are a hardcore machine learner (and everything else is equal with the models like no opinion on the designer etc) you then select the 5 best models. The 5 models with the best validation set. Again, all else being equal that is what validation sets are for.
Then stick with just those 5 models—never finding some retrospective reason to change those 5. And see how those do. This is now your hold-out test set. Meant to represent how you might do after selecting the 5 best designer models going forward.
Without a doubt Yuval’s discussion of regression toward the mean (mine too as I agree) will be important. The 5 will not—in general—do as well in the hold-out test as they did in the validation set. This is also an example of the multiple comparison problem. It is a statistical law and hard to fight. And I believe it is a law of nature and therefore impossible to fight.
But after you do this you will have some idea of what to expect if you invest in the top 5 designer models. A real idea of what your retirement might look like.
Maybe you run a Monte Carlo simulation on the holdout test set. Plan on the 2 bedroom condo beach house based on the lower interval of the Monte Carlo simulation and dream of the 10 bedroom mansion the upper bound of the Monte Carlo simulation suggest is at least possible.
Anyway, I apologize for the length. But I really do believe @Marco gets this and he has been “validated” in every sense of the word. I am being supportive and I think this is an important concept that everyone using ML/AI with P123 will want to understand.
In my case at least: Now for the holdout test set. I am doing pretty good one year out BTW, These are the only stock models I have run for the past year. No survivorship bias in other words. Clearly not statistically significant but postitive evidence. Median alpha for these is 17. The median annualized return is slightly higher at 17.9 (no coincidence I would guess):
Note, my port have had multiple changes, starting at 30 stocks, then 15, now twenty. Soon to go to 15 based on some of the validation sets above. And different (but similar) ML models. But if what I have done with my ports is a hold-out test set it does not accurately reflect any single validation set.
TL;DR; The evidence I have does suggest that Marco is indeed on the right track and there is no reason to think a skilled programmer could not do better than I have with this. My second point is the designer models could be considered a validation set. And finally, a hold-out test set is the only way to get a true idea of how a model is likely to perform.
Jim