Yuval,
I had not realized that the universe would change at each rebalance for the sim and I appreciate your pointing this out.
All,
I understand that there are times one would want a static universe. For example, when training and validating data over the same time frame. P123’s even/odd universe is another way of doing this. This is equivalent to using Mod(StockID,2), I think.
Also, I could imagine situations where I might use a rule that includes both methods for validation. For example this rule could be used for the validation data over the same time frame: Mod(StockID,)=2 & random < 0.5. The training data universe would use the rule Mod(StockID,)=1 keeping the training and validation data separate. It might be best to use the screener with no buy or sell rules or the rank performance test when using this method.
One could also think about whether they wanted to use the rule Mod(StockID,)=1 & random < 0.5 on the training universe when developing the system.
If you look at Friedman’s paper, how XGBoost or a random forest handles this it is clear that they are NOT subsampling half to the universe THEN sticking with half to the universe throughout.
XGBoost has a lot of fans, including Marco it seems, as he and his AI specialist will be making XGBoost available to P123 members according to a recent post. XGBoost clearly does not stick with half of the universe throughout. Rather, the universe is reshuffled frequently. Here is a quote from XGBoost’s documentation:
“Subsampling will occur once in every boosting iteration.” I.e., frequently as XGBoost uses frequent iterations in its algorithm.
This also may be interesting:
“Typically setsubsample
>= 0.5 for good results.” In other words, you may not want to use less than half of the universe unless you are forced to.
So random (and frequently re-randomized) subsampling has multiple established uses and some will be using it on P123 if the full functionality of XGBoost is made available. When you do consider using it there are some papers and mathematical proofs establishing the best ways to use it–like subsampling half or more than half of the universe when possible.
Again, people should use any (and all) methods that suit their needs. But you may find that random() < 0.5 may be adequate or even preferable for some situations. Maybe you would use both in some situations.
I look forward to the release of Boosting on the P123 platform and possibly a discussion from the AI specialist on the best uses of subsampling. Possibly with some generalization of the discussion to include the uses of subsampling and bootstrapping outside of the XGBoost program.
For me personally, this is one situation I would uses random() < 0.5, I would train on all of the data up until a certain date. Then to test (or validate) the data I would start on a date that is after the last date used for the training data and use random() < 0.5 on this test data. Maybe record each result on a spreadsheet.
This would give you a range of results that you could expect to see out-of-sample. And is pretty similar to what you would think of with Monte-Carlo simulations. Actually much, much better. It is like bootstrapping the results which has a long history ( with multiple books and papers recommending this method) and avoids the assumption of normality of the resulting confidence interval as it is a non-parametric method. It is well studied and universally accepted as having advantages at this point.
This does address the specific question in the original post. And in fact to address this specific question completely, I would use the above method and take a lower confidence interval as the likely result of both strategies. I would adopt the strategy that had the better result. I.e., the better lower-bound of the confidence interval. This is an expansion of what I said above about shrinkage.
Anyway, I would probably use both Mod() and Random() if I were using either of them for my studies at the present time.
Edit: so I think I have convinced myself to use this as I have detailed to test the out-of-sample results of a ranking system I have developed using data up until 2015. Do it as I have described from 2015 on.
And first impression: Wow!!! It definitely works for validation. One strategy held up. The other strategy that work well on the entire universe declined significant using random universe. In a way I might have expected but had not been able to prove.
And it is answered the question of the original post for my strategies. The strategy with lesser amount of data declined more (both in the average and the lower bound).
Jim