OK, from what I understand, Lopez de Prado has here rejected the walk-forward backtesting he has advocated in the past and is suggesting Monte Carlo backtesting on synthetic databases. In my opinion, there is no way to assemble a synthetic database that can serve as a functional proxy for a real one unless you’re only using a very limited set of data points (as Lopez de Prado does in his “practical example”). So Lopez de Prado’s approach might be appropriate for technical analysis, since the inputs consist of price and volume and the output consists of price. But I can’t imagine how one could possibly create a synthetic database for testing fundamentals.
Whether or not that leaves walk-forward testing as the best alternative is an open question, in my view. The problem with walk-forward testing is that a year-long out-of-sample period is more or less meaningless as every strategy is likely to underperform, and a five-year out-of-sample period means that you’re not focusing on what has been happening in the last five years.
If we reject Monte Carlo and walk-forward backtesting, that leaves resampling as the best way forward for machine-learning techniques.
As for us human non-machines, I think Marc is right: the tools that we have at Portfolio123 are extremely powerful. We have four very different ways to backtest a strategy–simulations, screens, rolling screen backtests, and ranking bucket performance. We have the ability to vary our universes, our holding periods, the period of our backtests, the number of holdings, our screening/universe/buy rules, and the ranking weights of our factors, so that we can see if our strategy will still work if we vary it, and at what point the variations will break our strategy. The kind of robustness testing that’s available to us here is mind-boggling. I estimate that one could run over 5,000 different backtests on variations of a strategy to see how robust it is.
I have mixed feelings about all-weather strategies and tactical allocation. I agree with Marc that looking at the 1999-2006 period is not terribly useful, but a strategy that would have utterly failed during that period is not one I would trust. I don’t think I personally will ever be able to predict to any degree of accuracy the economic regime of the next few years, so picking a past one that resembles the one that’s coming up is going to be impossible for me. Currently I’m favoring a strategy that would have worked well over the past ten to twelve years but also the past three years, and I plan to change that strategy as time progresses and new data comes in. From what I have found through correlation testing, the best way to assess a strategy is to look at its risk-adjusted performance (its alpha, measured weekly) over the past 3, 10, and 12 years and average those, giving double weight to the 10-year number. For me, this gives me a healthy mix of a huge variety of factors, the large majority of which are based solely on financial statements (rather than on price, volume, estimates, etc., all of which I also use but to a lesser degree).