Tactical Investment Algorithms

Jim,

I am not sure if you have already seen this paper. It suggests the preferred method of backtesting our investment strategies.

Regrads
James

Abstract

There are three fundamental ways of testing the validity of an investment algorithm against historical evidence: a) the walk-forward method; b) the resampling method; and c) the Monte Carlo method. By far the most common approach followed among academics and practitioners is the walk-forward method. Implicit in that choice is the assumption that a given investment algorithm should be deployed throughout all market regimes. We denote such assumption the “all-weather” hypothesis, and the algorithms based on that hypothesis “strategic investment algorithms” (or “investment strategies”).

The all-weather hypothesis is not necessarily true, as demonstrated by the fact that many investment strategies have floundered in a zero-rate environment. This motivates the problem of identifying investment algorithms that are optimal for specific market regimes, denoted “tactical investment algorithms.” This paper argues that backtesting against synthetic datasets should be the preferred approach for developing tactical investment algorithms. A new organizational structure for asset managers is proposed, as a tactical algorithmic factory, consistent with the Monte Carlo backtesting paradigm.


Tactical Investment Algorithms.pdf (380 KB)

James,

Thank you.

I have some experience with walk-forward validation. Nothing like de Prado however and I will continue to read and re-read his works on this.

The recent change in the effectiveness of value factors has me wondering how to deal with change in “market regimes” which is covered in this paper.

I will definitely study his ideas on this.

Best,

Jim

How does one apply any of this? This is not a flippant question, i genuinely want to know

RT,

IMHO, if you don’t have acccess to Monte Carlo simulation, I guess an easy way is to follow “regimes” that is mentioned in the paper to backtest your investment strategy against historical periods which has similar economic conditions/interest rates instead of “all weather” “all time frame” approach.

Regards
James

RT,

Walk-forward for a P123 ranking system would be like this. You might optimize a ranking system from 2000 - 2004 (inclusive) and then see how it worked out-of-sample for the year 2005.

Then you would optimize 2000 - 2005 and see how it worked for the year 2006. Then 2000 - 2006 and test on 2007…….2000 - 2018 and test on 2019.

You then combine (append or concatenate if on the computer) all of the out-of-sample (test) years.

This would give you to ability to get 15 years of out-of-sample results on a ranking system, in theory.

In practice, you usually have a validation set and a smaller test set.

In any case, I think people have a misunderstanding of what most of machine learning is about. Most of it is about getting out-of-sample data without having to paper trade for years.

Done right, some of the techniques reduce overfitting and prevent (not cause) the nightmare overfitting that people imagine.

While I called my Machine Learning of technical factors a “Failure” is was a success in that it allowed me to reject a technical strategy without having to paper trade the strategy for years.

You can see how this would be particularly useful for the ETF strategy: Jeff’s suggestion. You would select the best ETF for 2000 then test the strategy for 2001. Then select the best ETF for 2001 and test 2002……select the best ETF for 2018 and test 2019.

Assuming there was no “data snooping” it could give a long period of out-of-sample PIT results (19 years). Ready to trade with years of data if the results were good.

Let me expand on any of this if you want me to.

Regards,

Jim

Jim

Did you notice that Marcos Lopez de Prado is no longer at AQR Capital Managment. He stayed there for less than a year.

Regards
James

Wow! No I missed that.

I was looking for some links about walk-forward and found this Investopedia article: Backtesting and Forward Testing: The Importance of Correlation

I guess I first saw correlations extensively used by Yuval. I do not know if he has modified his method or still uses it but I intend to learn more about it.

From the article:

“…….two different systems were tested and optimized on in-sample data, then applied to out-of-sample data.”

The out-of-sample data can be obtained using walk-forward validation. But it is not the only method.

Then the article recommends getting a correlation between your in-sample and out-of-sample resullts:

“Good correlation between backtesting, out-of-sample and forward performance testing results is vital for determining the viability of a trading system.”

“Vital” it says. Forward performance is paper trading in this article and is also a type of out-of-sample data.

If Yuval is still doing this, one only need look at his Designer Models (that are holding up well in this market) to see that this is something that may be worth looking into.

Personally, I do not believe Investopedia, de Prado and Yuval are all wrong on this.

Best,

Jim

Jim,

I found an interesting article that reference to this paper through my daily routine checking of the Lexis Nexis database. I looked it up and it seems to be exactly what you are doing right now.

A Machine Learning Approach to Risk Factors: A Case Study Using the Fama–French–Carhart Model

Abstract

Factor models are by now ubiquitous in finance and form an integral part of investment practice. The most common models in the investment industry are linear, a development that is no doubt the result of their familiarity and relative simplicity. Linear models, however, often fail to capture important information regarding asset behavior. To address the latter shortcoming, the authors show how to use random forests, a machine learning algorithm, to produce factor frameworks that improve upon more traditional models in terms of their ability to account for nonlinearities and interaction effects among variables, as well as their higher explanatory power. The authors also demonstrate, by means of a simple example, how combining the random forest algorithm with another machine learning framework known as association rule learning can produce viable trading strategies. Machine learning methods thus show themselves to be effective tools for both ex post risk decomposition and ex ante investment decision-making.

Regards
James


A Machine Learning Approach to Risk Factors A Case Study Using the Fama French Cahart Model.pdf (543 KB)

James,

Thank you. And yes exactly. There can be no doubt that the linearity assumption is a problem.

Or that the use of PIT out-of-sample methods like walk-forward testing can be the difference between what we see with some of the Designer Models and the professional Quant Funds, IMHO.

I do not believe this last is even foreign to P123 and may have already been shown to be effective, out-of-sample, in some of the Designer Models.

Best,

Jim

Jim

In the paper, it describes a sector rotation strategy that is prepared using machine learning based on the four Fama French factors (including value that you are working on). It is very mathematical, you can check it out yourself.

Regards
James

James,

Reading the paper I find that the first 2-3 paragraphs is a perfectly exact description of the standard P123 method.

Good enough? I will read on to see what the authors may suggest as improvements.

Peeking ahead, I have tried Random Forests and I certainly will not claim that it will make you rich. Neural nets are probably better. Again, good enough? I make no claims in this regard.

I have already shared that Machine Learning did not find an effective strategy for technical factors where data restrictions are not a factor. Not yet anyway. I have not done anything truly out-of-sample with machine learning strategies based on fundamentals.

Best,

Jim

Jim,

I honestly cannot tell if the paper is useful or not. (only attaching it for your reference)

It is too mathematical for me to understand most of its content.

Glad to find out what you think about it.

Regards
James

I downloaded the paper but haven’t read it yet (I will). But for now:

Well . . . duh. What the #%$@ do y’all think I’ve been so down on seeing folks here puff and preen about those great backtest graphs going back to 1999! (A longer history is good to have, but not to produce those nonsensical self-deluding equity curves; the benefit is to provide a larger inventory of market environments from which you can choose in order to do more focused studies.)

At this point, I backtest nothing going back more than 10 yers because there’s been so much structural change since before then, I don’t want the inevitable powerful results to infect my test and make me . . . or others . . . think I’m smarter than I know I really am. And even 10 years is more for presentation needs. To get me to turn really thumbs up, I have to accept based on my analysis of a tighter period, and even that won’t tell me what’s going to happen if/when rtes eventually turn up.

And that’s exactly what I’ve advocated when I suggest a variety of tests run over shorter time periods, cherrypicked to examine different sorts of conditions.

Of course y’all are diligently doing your rolling backtests . . . right?

Seriously, p123 is already giving you all you need to succeed (more than enough since I doubt every user uses everything . . . there’s plenty of opportunity for each one to pick and choose). You don’t need super computers, monte carlo (take seriously that Monte Carlo has the same initials as More Crap) and all sorts of other fancy overhyped bullsh**. If you’re not using p123 successfully, then you have zero chance of doing better by going fancier, but if you do go fancy, at least you’ll get to spend more money to falter . . . and in the end, still be downloading every paper you can find suggesting there might be something else even better and fancier out there.

Marc,

I would tend to agree.

I would recommend the correlation/validation strategy that Yuval has recommended in the past (whether he still uses it or not). With or without walk-forward testing to obtain the out-of-sample data. There is plenty of literature to support this no matter who may be using this now at P123.

I look forward to using an easier implementation of this when Marco goes forward with his plans of making Python available to the power user he is working with along with some of us—perhaps at a higher fee.

De Prado does not present a single—all or nothing—strategy in all of his writings. We do not have to do everything he does and I certainly have no plans to use his Monte Carlo strategy. I think it would unwise to completely ignore everything he does or writes about, however.

We can like one small idea that de Prado uses can’t we? He actually has a whole book of ideas: “Advance in Financial Machine Learning.” Not a single useful idea there? Really?

Other than my wife and daughter I know of nothing that could not stand some sort of effort at improvement. Uh well, my daughter needs to finish school and find a better boyfriend.

Best,

Jim

Jim,

What do you think about the machine learning paper?

Is it useful to you?

Regards
James

James,

The methods are a review. I like boosting over Random Forests. I probably like neural nets over both of those.

But ultimately factors are more important than the methods. I might look at some of their factors. I do not have an opinion on the factors they use now.

Best,

Jim

OK, from what I understand, Lopez de Prado has here rejected the walk-forward backtesting he has advocated in the past and is suggesting Monte Carlo backtesting on synthetic databases. In my opinion, there is no way to assemble a synthetic database that can serve as a functional proxy for a real one unless you’re only using a very limited set of data points (as Lopez de Prado does in his “practical example”). So Lopez de Prado’s approach might be appropriate for technical analysis, since the inputs consist of price and volume and the output consists of price. But I can’t imagine how one could possibly create a synthetic database for testing fundamentals.

Whether or not that leaves walk-forward testing as the best alternative is an open question, in my view. The problem with walk-forward testing is that a year-long out-of-sample period is more or less meaningless as every strategy is likely to underperform, and a five-year out-of-sample period means that you’re not focusing on what has been happening in the last five years.

If we reject Monte Carlo and walk-forward backtesting, that leaves resampling as the best way forward for machine-learning techniques.

As for us human non-machines, I think Marc is right: the tools that we have at Portfolio123 are extremely powerful. We have four very different ways to backtest a strategy–simulations, screens, rolling screen backtests, and ranking bucket performance. We have the ability to vary our universes, our holding periods, the period of our backtests, the number of holdings, our screening/universe/buy rules, and the ranking weights of our factors, so that we can see if our strategy will still work if we vary it, and at what point the variations will break our strategy. The kind of robustness testing that’s available to us here is mind-boggling. I estimate that one could run over 5,000 different backtests on variations of a strategy to see how robust it is.

I have mixed feelings about all-weather strategies and tactical allocation. I agree with Marc that looking at the 1999-2006 period is not terribly useful, but a strategy that would have utterly failed during that period is not one I would trust. I don’t think I personally will ever be able to predict to any degree of accuracy the economic regime of the next few years, so picking a past one that resembles the one that’s coming up is going to be impossible for me. Currently I’m favoring a strategy that would have worked well over the past ten to twelve years but also the past three years, and I plan to change that strategy as time progresses and new data comes in. From what I have found through correlation testing, the best way to assess a strategy is to look at its risk-adjusted performance (its alpha, measured weekly) over the past 3, 10, and 12 years and average those, giving double weight to the 10-year number. For me, this gives me a healthy mix of a huge variety of factors, the large majority of which are based solely on financial statements (rather than on price, volume, estimates, etc., all of which I also use but to a lesser degree).

All,

I would hope everyone would agree that whatever someone thinks about this one single paper that it could not be used as the final word on all of the useful methods.

I will note that many have posted that they use a resampling method (that is discussed in the paper) and some sort of validation whether they specifically use or like the walk-forward method or not. There is actually some agreement among many members at P123 that some of the methods may be useful for those who want to focus on this paper.

Whether the reader likes the paper or not, I would say this one paper has already gotten way more discussion than it deserves.

There are a lot of ideas used by members at P123 that involve manipulation of the data outside of the platform. I do not think I use all of the good ideas. Maybe someone else already does.

P123 already has plans to develop some of these good ideas. Others will be developed later. Some just are not high enough in the priority list but good ideas nonetheless.

P123 does already force us to use out-of-sample data for the Designer Models. There is a good reason for that, IMHO.

This is not necessarily the only way to get out-of-sample data. Is even/odd universes the only other good way? Me, I think it is not PIT or out-of-sample but the method is used and recommended at times nonetheless.

Personally, I have no problem if someone wants to use that regardless of what the de Prado article says or I may think. I prefer to use something a little different, however.

Others use methods not mentioned in this thread or in the paper—ones I am not even aware of. I think that is okay.

Best,

Jim

Jim

This is just another proof that we really need to see the out of sample performance or perform some kind of validation like the walk foward method before putting money in a newly backtested strategy.

Regards
James

THE DARK SIDE OF BACKTESTING

Suhonen, Lennkh, and Perez analyzed the backtested and live excess returns of 215 quantitative strategies issued by fifteen investment banks between 2005 and 2015. The universe includes strategies from equities, fixed income, currencies, commodities, and multi-assets. The research paper shows a significant difference between the in-sample and out-of-sample performance.

Naturally some strategies are expected to generate worse returns than during backtesting as no strategy performs consistently. However, all strategies generate significantly lower returns once live, which highlights that the strategies are either the result of data mining or are impacted more negatively by transaction and market impact costs than expected by the investment banks.

The difference between backtested and live performance was greatest for equity strategies, which perhaps highlights the intricacy of dealing with thousands of individual stocks versus only a few commodities or currencies. The researchers conclude that all backtested returns require a discount, which should be proportional to the complexity of a strategy.