Mis-specification versus multicollinearity

This quant tactic can lead to the problem of multicollinearity. That means, as Marc describes, that your are using two factors that are highly correlated. But is that a problem?

The potential problem is that it can lead to is poor out-of-sample performance. Because the 2 factors are correlated it is difficult to know how each factor is affecting your in-sample data and you do not get the weights right for your out-of-sample data. Weight here is just the weights of your ranks. If your out-of-sample population is different then the weights are likely to be wrong for the out-of-sample data.

This stuff is not easy and ultimately Marc’s tactic may be the best. In fact it probably is a good tactic if you expect your sample to be the same (i.e., the markets will not changed in any way). But I think it is worth discussing.

Like it or not: at P123 we are using linear estimators. That is what a rank system is.

I get that some of you do not like BLUE. They do not want the Best Linear Unbiased Estimators. But do you have to go out of your way to get the worst?

Again I stress, that Marc’s method under certain circumstances may be the best. Leaving a factor out can be a Mis-specifiction problem as he correctly notes: “omitted-variable bias.” Also note that, I think Marc’s out-of-sample performance is good at matching his in-sample-performance. Maybe it is because his fundamental factors do make sense as he describes. In any case it is a bit of a judgement call and Marc’s judgement is working well for him.

I am actually wondering about this with my own systems. Should I get rid of a factor that is highly correlated? One that only adds a little to the returns. Will it cause overfitting and poor out-of-sample performance? Does this in fact happen with some R2G ports?

I think I have a lot to learn about the actual causes of overfitting but I do think multicollinearity is one of them.

Multicollinearity is not a big problem. Just use economic and common sense logic. Weight your factors equally within one category/theme, it is the best way to avoid fitting. Weight categories depending on current and forecasted market conditions.

Many factors is not a bad thing, conversely it brings stability in OOS performance, especially it is important dealing with missed data bases as we have in small cap stock universe.

Don’t look to much on simulations, it is the history, look ahead in future.

Thanks Yury!

If a predictor factor doesn’t have economical and simple logic explanation then don’t use it.
More likely it showed some correlation purely by chance but the “scientific” explanation has been invented (read adapted or fitted).

Yury,

I agree completely. I am mainly thinking that as long as I will be using a weight anyway (even for a category as you describe), I might as well get the weight right rather than depending on random luck.

Your input on whether to keep a factor is very helpful.

Yes, you have to put some weights to categories. It should be based on your general stock forecasting model - market timing model.
For example, my system shows zero return for the next year and allocates 50% to the equity market.

It means you have to move to defensive mode. Use more weigths for defensive factors as quality, stability, low volatility, higher caps, dividend yields etc. Minimize risky ports allocation in your overall book but leave some for diversification purposes anyway, because one way or another any MT is not reliable enough. Use bonds (with lower maturity) and if you can’t get rid of big long positions fast add shorts (index or short systems if you have it with stop losses and take profit orders).

That’s the simple and quite effective way to act in practice for personel investing. If you have big money and required resources you can use more complicated approach that I will discuss in my next posts soon.

Jrinne,

Multicollinearity is a very big problem if you are using OLS to estimate the weights. (And yes, OLS has the BLUE property if the right conditions are satisfied). If you include two factors that are nearly identical (ie. very highly correlated), like PEInclXOR and PEExclXOR for example, then OLS will give you weights that tend to be extremely large. So large that it overwhelms all the other weights in your ranking system. And if the out of sample is not exactly right, it will fail very badly.

That said, there are plenty of statistical methods that are designed to handle multicollinearity in estimation. Ridge regression is an example. Ridge regression is not BLUE, but works much better when there are highly collinear regressors (ie. factors in a ranking system). The statistical and econometric literature is huge in this area.

More importantly, a manually designed ranking system does not suffer from this problem. You would never assign a weight of -100000 to PEInclXOR and a weight of +100001 to PEExclXOR in the same ranking system and keep all other weights much smaller. (Yet this is what OLS would do.)

You can also see the individual nodes of a ranking system as individual “forecasts” and the total system as a “forecast combination” of its subnodes. There is a well known result in forecast combinations that combining different forecasts leads to better overal forecasts. The smaller the correlation between the individual forecasts, the more improvement you get by combining them. If they are highly correlated, that does not need to be a problem, as long as you use a method to compute the weights that doesn’t suffer from the multicollinearity problem like OLS. In the “worst” case, when individual forecasts are highly correlated and you use equal weights for example, you hardly gain anything by combining them. But the point is, it also doesn’t hurt.

Forecast combinations have been researched in econometrics for the last 10 years or so (probably longer). One common finding in many papers that investigate combinations for macro-economic or financial data is that it’s very hard to beat equal weight combinations. I’ve done a project myself where I tried to forecast exchange rates, and I found the same. In the econometric literature this is attributed to the fact that if you estimate the weights (using whatever method) you introduce uncertainty in the estimates. This uncertainty is very large in financial data because there is so much noise and so little signal. This leads to very bad estimates of the weights, and therefore to very bad out of sample performance. Equal weights does not have this particular problem.

Conclusion: as long as you don’t use something like OLS to compute the weights, you don’t need to worry about multicollinearity.

Note: the above does not mean that you can just throw any factor in a ranking system of course. If the individual “forecasts” are bad, the combination will also suffer.

Peter. So cool!!! Thank you very much!

BTW, what software is good for doing OLS? Is STATA good? Is EXCEL okay for starters in case it is not worth the investment of STATA?

Thanks Again!

Regards,

Jim

I think this is the sort of question that’s best tackled through case study.

Can you give us an example of a siutation you have in mind? That way, we would be able to weight the pros and cons for that situation.

Marc,

One thing I was thinking about is using Close(0)/Close(X) with close(0)/Close(Y). There has to be some correlation and the closer X is to Y the less useful (with greater multicollinearity) you get. But this is probably true with other technical indicators as well.

Another question: How closely correlated (and useful) are changes in price to sales and price to earnings in the same port? If there is a change in earnings it will often be due to a change in sales (not always).

Thanks.

Jim

Jim,

OLS is so simple that you can quite easily do it in Excel. If you google it, you’ll find examples. I would strongly advise against buying Stata just for trying out OLS. It’s like buying a cargo plane to buy a six pack at a supermarket. Too expensive and too complicated for the job.

You’re right about Close(0)/Close(X) vs close(0)/Close(Y): the closer X is to Y, the higher the correlation. No idea about P/S and P/E.

Regards,
Peter

Peter,

Very helpful. I will try Excel first!

If x and y are very close to one another, that may not make for a good case for multiple items.

PE and PS is a very different matter. I suspect they may be highly correlarted (haven’t actually checked) but this would be a good case for using both. In theory, they both get at the same thing – the relationship between price and the theoretical stream of potential deividends. But each is subject to different kinds of case-specific abberations. PS can be thrown out of whack by recent acquisitoion or divestirue activity. PE can be thrown out of whack much more easily – non-recurring items make PETTM a very dirty data series. PECurrY and PENextY are better in that non-re curring are typically not estimated. But even with normal business expenses, many do not vary with sales and can make PE much more volatile.

So we have a tradeoff. PE is better in that E is closer to what shareholders ultimately want (dividends). But S is much less choppy and more likely to give stable ratios for more companies at more times. This is a good example of where you can use multiple factors to diversify against oddities, even though under normal conditions, they may be highly correrlated.

Marc,

Thank you. I definitely get your points and I think you are probably right. Leaving out Sales (or earnings) probably would be a mis-specification. Either one of these are probably relevant to future returns–even if they are correlated. I still remain interested in what causes bad out-of-sample performance.

While we a talking about mis-specification, what do you think happens when we make the other type of mis-specification mistake? By that I mean adding an irrelevant term. I wonder if by “optimizing” an irrelevant term until it seems to help the sim, we are not causing poor out-of-sample returns. I noticed in Hull’s Paper they had a cut-off. They did not include a factor that did not have a certain correlation to the future returns of the market. They had a test for whether a factor was relevant. Their “kitchen sink” method where they included irrelevant terms did not do very well.

Do you ever test a term for relevance or do you go more by theory? If you do test a factor before including it what do you like to see on say the rank performance test?

Regards,

Jim