alternatives to optimization?

Thank you. I think I read this sometime ago. Great article and interesting findings, but one of the problems you raise at the start of the article is the problem with curve fitting.

What you seem to find in the study is the correlation between increased performance and the number of factors used in the ranking, but have you done something similar, but run the test in and out of sample?

Would an increase in the number of factors also increase the problem of curve fitting?

Thank you. I’m afraid I don’t understand that paper at all. I got totally lost around section 2.2 (methodology), which is the section I would most like to understand.

Yes, there probably is a relationship between curve-fitting and using lots of factors. After all, if you’re going to curve-fit perfectly, you probably need to use lots of factors. But it doesn’t follow that increasing the number of factors necessarily results in curve-fitting. Curve-fitting is a result of the what backtesting tests you’re running. If you’re running thousands of tests on a very specific universe and time frame, then you will end up with a closely curve-fit system that works beautifully on that specific universe and time frame and may well totally fail out of sample. If, on the other hand, you run your tests on a lot of different universes and time frames and generalize from those, you may be able to evade curve-fitting to some degree. Both kinds of backtesting involve exploring lots of factors for the best fit, but maybe the second approach, which still uses lots of factors, can avoid curve-fitting to some degree. That’s always been my hope, and it’s an approach that has worked very well for me out-of-sample. I don’t know if the curve-fit approach would have worked or not because I’m rather leery of it so I haven’t spent much time doing it.

No one is going to learn how to do PCA regression from posts in the forum. Obvious right?

The problems (or potential problems if you do not need modification) of multivariate regression are all well know. And there is more than one potential solution. None of them perfect. Some better for certain situations than others. If any of them are helpful at all for your data… No guarantee that what you decide to try will work for your factors or that you will arrive at the best way to use a method when you do try it. Honestly, I do not see how you could get the method right if you are just using the forum to learn the method.

The paper also discusses Lasso regression AND some are asking the question of how many factors to use. Was Lasso Regression the most effective method in the paper (I only skimmed it so real question)?. PCA reduces the number of noise factors but in an indirect way. Lasso regression addresses the question of how many factors to use directly in a mathematical way. I think that is all it was designed to do.

Duckruck also talks about other methods of shrinkage all designed to reduce overfitting. Overfitting was another question discussed in the posts above. Shrinkage will be a frequent answer to this question if you look in any texts or peer-reviewed journals. Right next to cross-validation in any reputable source.

If someone truly wants to look for (and try) “alternatives to optimization” Duckruck has already pointed members in several good directions. But I don’t think he or anyone in the forum will be able to give you all that you need in a few posts to learn the technique. Or any guarantee that it will work for you and your data.

BTW, I truly hate Lasso Regression (preferring ridge regression and not using it either in any of my ports). I do NOT invest using PCA either. I do not use multivariate regression in any form in a funded port. One single technique works for everyone would never be my point. If someone has found something that works for them that includes none of this then “GREAT” is all I have to say. Full stop.

Duckruck has presented several great ideas for those looking for “alternatives to optimization” in a serious way.

Jim

Here is, perhaps, a simpler question that either you (JRinne) or you (duckruck) can answer to help me on my way here in understanding how multilinear regression is practiced in academic papers.

Please see this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2894068. In section III, Fundamental Momentum, the authors define seven fundamental variables, most of them of the income-divided-by-something variety. They then crunch some numbers relating to these seven fundamental variables to come up with something called FIR, which is “estimated by using all [the seven] fundamental variables and their trends.” I tried and tried to come up with this FIR using multivariable regression, but I have no idea what I should be regressing to, or what I should have been doing to make this work. Are the authors using the raw numbers for the seven variables or the long-short returns (and if the latter, are they using the top third minus the bottom third or the top tenth minus the bottom tenth), and what are they regressing to, and what are they coming up with to get FIR? This is probably crystal clear to one of you, so I’d really appreciate your help. Thanks!

Yuval,

I will look at this paper some more but I don’t think I have much. I certainly cannot turn this into something I or we at P123 could use at P123 or with a spreadsheet at home.

Did I learn anything at all from the paper related to this thread? They do express a concern about multicollinearity. And address it in a way I am not familiar with: Since the fundamentals of a firm are likely correlated, and trends of different time horizons are not independent, some of our predictors can have high correlations with each other. Econometrically, this can raise the degree of multicollinearity in multivariate regression (4), causing over-fitting.
To resolve the issue, we consider an alternative forecast combination approach. This approach is strikingly simple. Let {xm }M be all the predictors…."

I have no idea what they are doing here to address multicollinearity. So all I got is the authors of this article believe multicollinearity can be a problem with multivariate regression.

The problems with multicollinearity and predictions are actually debated. I will leave the final word with the authors here in the interests of brevity.

FWIW, I would really like to use these variables with XGBoost when P123 makes that available!!! I might wait until then before revisiting this article. XGBoost is effective for problems with multicollinearity and with proper cross-validation has some other potential advantages—especially if P123 makes it easy to use.

interactions of the variables is maintained with XGBoost. Please understand I am not getting paid to promote multivariate regression and any concerns you have about the method are probably echoed by me—at least at times. How well those problems can be mitigated is an open question that probably depends on the data more than anything.

Sorry I could not help more.

Best,

Jim

Duckruck,

From the article they use: " ……. LASSO, ridge regressions, and
elastic nets to obtain indices of forecasted returns."

Elastic net is a combination of Lasso and ridge regression. Lasso regression will remove a noise variable while ridge regression will shrink it. So Elastic net will do both (remove noise variables and shrink the variables that are not removed).

You have added Principle Component Analysis as a useful technique for linear models.

Please correct me if I did not summarize much of what you have said in the best way.

Jim