Investing system validation and robustness testing

Hi all,

I am working on determining what due diligence is for testing a ranking and simulation “investing system” before I fully fund it (vs play money). In other words, how do I do validation and robustness testing on my investing system using as many tools as possible that are enabled by the api download.

If other folks have thoughts or feedback I would appreciate hearing it! Also I apologize if I am duplicating existing threads! I know this is a fairly well discussed topic.

Here are the steps I plan on using (big thanks to Yuval and Jim’s posts): the last three bolded text are my robustness indicators.

  1. Define a investing system. Optimize it, or not, as you desire. If you are doing ML don’t train it on the following test data!!!
    – I have optimized on the same data I am using to validate when using a ranking system that’s not ML. Maybe it invalidates what I describe below…
  2. Take the universe and divide it up into multiple parts
    – Edit: Currently I am doing 5 splits using the StockID. Other potential options are kfold cross validation, walk forward (this is kind of a rolling test), time series… Some of these are also more suited to ML methods due to the need for training data.
  3. Run the simulation on each of the universe parts and calculate weekly returns
  4. Trim the weekly returns to remove outliers
    – Edit: A few options here as well. Remove the top and bottom X%, compare to universe return and use ellipse of confidence trim…
  5. Calculate your desired performance metric using the trimmed data. I personally recommend alpha calculated using the sub-universe returns as it will help account for the variation in potential returns across the splits.
  6. Calculate a mean (and std) of the your metric for the various splits. If your metric disappears your system is not very robust!
  7. Run a rolling back test on the entire universe. Once again if you see long periods of no alpha it may not be a robust investing system.
  8. Run Monti Carlo simulations or boostrap confidence intervals. Look at the bounds and see if they are acceptable. I don’t know how to do this yet, but from what I know it is a good idea to do…
  9. Edited add with suggestions from other posts: perturb your input variables by say 10% and run the same tests again to see how the results change. Input variable examples: ranking weights, buy and sell rules, universe splits…

Edit: Since this is a lot of tests it probably requires python or other semi-programming methods to implement.



I like that.

Just to show there is more than one way to do a thing, Walter mentioned time-series validation.

You need a model that runs fast and/or a fast computer for this.

I had not done one why present system mostly because I had not figured out how to handle different sample sizes. My funded model is fast, however. I did have a holdout test sample but just one.

If I start training the first 10 years of data and test the next year out of sample something that is not significant at 10 years could become significant with 20 years of data. Getting more data can make something become significant.

For walk-forward I would then train 11 years and test year 12……etc. But the sample size changes each time and I was not sure how to do that.

I thought of a way to adjust the significance according to the sample size and CharGPT convinced me it has some rationality (based on the standard error calculation).

So I think I will do a walk-forward validation of my present system and some of the ones I develop with DataMiner downloads…

Just another way to do it. I like more than one way. They each have advantages and disadvantages. The walk-forward is resource intensive and has other disadvantages.


There are a number of suggestions here: Break Your Strategy: How to Stress Test Your Quantitative Models - Portfolio123 Blog. Also see the very first webinar on this page:

1 Like