Hi all,
I am working on determining what due diligence is for testing a ranking and simulation “investing system” before I fully fund it (vs play money). In other words, how do I do validation and robustness testing on my investing system using as many tools as possible that are enabled by the api download.
If other folks have thoughts or feedback I would appreciate hearing it! Also I apologize if I am duplicating existing threads! I know this is a fairly well discussed topic.
Here are the steps I plan on using (big thanks to Yuval and Jim’s posts): the last three bolded text are my robustness indicators.
- Define a investing system. Optimize it, or not, as you desire. If you are doing ML don’t train it on the following test data!!!
– I have optimized on the same data I am using to validate when using a ranking system that’s not ML. Maybe it invalidates what I describe below… - Take the universe and divide it up into multiple parts
– Edit: Currently I am doing 5 splits using the StockID. Other potential options are kfold cross validation, walk forward (this is kind of a rolling test), time series… Some of these are also more suited to ML methods due to the need for training data. - Run the simulation on each of the universe parts and calculate weekly returns
- Trim the weekly returns to remove outliers
– Edit: A few options here as well. Remove the top and bottom X%, compare to universe return and use ellipse of confidence trim… - Calculate your desired performance metric using the trimmed data. I personally recommend alpha calculated using the sub-universe returns as it will help account for the variation in potential returns across the splits.
- Calculate a mean (and std) of the your metric for the various splits. If your metric disappears your system is not very robust!
- Run a rolling back test on the entire universe. Once again if you see long periods of no alpha it may not be a robust investing system.
- Run Monti Carlo simulations or boostrap confidence intervals. Look at the bounds and see if they are acceptable. I don’t know how to do this yet, but from what I know it is a good idea to do…
- Edited add with suggestions from other posts: perturb your input variables by say 10% and run the same tests again to see how the results change. Input variable examples: ranking weights, buy and sell rules, universe splits…
Edit: Since this is a lot of tests it probably requires python or other semi-programming methods to implement.
Thanks,
Jonpaul