alternatives to optimization?

Georg,

You are the one who has recommended AIC for P123. This is one of the methods of “feature selection” I alluded to above. Damn smart if you ask me.

Convenient that you forget this. I will find the link if you truly cannot remember. Well in fact here it is:

But I am not interested in you using my ideas (or even your own). I do think Python could be used by many—probably not you it seems. Not here anyway, just on your site I guess. You do sell your statistical techniques over there for sure.

-Jim

I have read through a lot of posts today (all the way back to 2008) on recommendations on the number of nodes that should be in a good and robust system.

The problem - the answers vary a lot. Most agree that reducing the number of buy and sell rules, and having nodes in the ranking systems that makes sense, but then it stops.

Some say that by reducing the number of nodes (8-10) reduces the possibility of overfitting, others say to have as many as possible as long as it makes financial sense, preferably nodes that represent all the factor themes.

Does anyone have a good answer here? I tried looking into the DM models, and the ranking systems that are available to the public, and comparing them to the in- and out-of-sample periods. But that did not provide a good answer.

Most of the ones that had great performance over a period of 10 years ( more then 10 stocks, turnover lower than 600%) had less than 18 nodes. But much of their out-of-sample performance is worse than the market.

I think parallel analysis is a good tool for determining the number of components when doing PCA or FA. This actually compares the information in a component to the noise in the data in objective way.

ChatGPT describes this better than I could: "Parallel analysis is a statistical method used to determine the number of components or factors to retain in techniques such as principal component analysis (PCA) or factor analysis (FA).

In PCA or FA, one of the challenging decisions to make is determining the number of components or factors to retain for further analysis. ………

Parallel analysis provides a robust solution to this problem. It works by comparing the observed eigenvalues derived from the PCA or FA with those obtained from a randomly generated dataset of the same size. The randomly generated data acts as a kind of benchmark, representing the amount of variance that could be accounted for by chance alone."

Uh…I had to drop out of physics at UC Berkeley because I did not fully understand what an eigenvalue was or how it related to Fourier Transforms or even why I was supposed to care. To be honest I am not sure I every really got it. :thinking: :worried: So just a thought.

Jim

For their paper and their factors?

I did not look at their factors. If I were to try PCA again I think I might at least look at parallel analysis.:wink: or not.

For clarity variance explained is not necessarily related to the effect size. There is universal agreement on this point. So I don’t really know ahead of time for a specific set of factors. At least parallel analysis gives an objective answer. Other than someone else’s findings for a different set of factors.

If anyone—including me—wants a definite answer they would do cross-validation with the number of PCAs being a hyper parameter for their data. Some of that is easy to say, a month to do sometimes.

Barring that, P123’s servers can handle a few extra PCA’s which is often the reason people really care about this.

To be clear about what you are advocating you don’t even look at a scree plot on occasion and don’t consider that 5 may be too many sometimes?

Jim

Very cool that you are doing scree plots and other things.

Nice and I agree 6 is usually MORE THAN enough.

Jim

BTW, can I ask if you use a correlation cut-off? Even if you do use a cut-off you must encounter factors with a negative coefficient in a principle component. If so how do you handle those?

Thanks.

Jim

I don’t know much about PCA analysis, but isn’t it a completely different approach than using multifactor ranking? If Whycliffes’s question was about multifactor ranking, does PCA have any relevance?

Whycliffes, I tried to address this question about five years ago in this article: The Two Types of Investing or Trading Errors - invest(igations)

Not different but a sometimes-used addition. Actually pretty often used I thought from my readings.

Advocated as a solution for the problem of multicollinearity usually.

You basically always, and I mean always, have problems with mis-specification or multicollinearity and in practice both when doing multivariate analysis. There are solutions and Duckruck is kind enough to share one of them.

He is obviously well informed. Boosting addresses this the issue of multicollinearity in a whole different way. I guess Rikki is working on the XGBoost and I am not sure that I actually recommend either method,.But Duckruck clearly advances the discussion here.

I get that eigenvectors are not that easy. Believe me, I get that. :smiley: But I am not sure they were not trying to got me to learn it for no reason in school.

Anyway thank you for your comments Duckruck!!! They do belong here I believe.

Jim

If you want to invest in, say, the top 25 stocks according to a variety of fundamental factors, one option is to use a multifactor ranking system. Another option is, I assume, to use PCA instead. Do I have that right? How would you go about using PCA for this task? I know the basics: you would standardize all the factors, create a covariance matrix, and then use eigenvectors to combine the factors into principal components (though I’m a little unclear as to whether a specific factor contributes to several principal components or only one). Then what?

Can be more than one. It is often more than one I think. And sometime a factor will have a positive weight or “loading” in one principle component and a negative loading in another.

PCA regression.

I am not sure how Duuckruck is using it. But PCA regression is very common. PCA regression is an extension of multivariate regression and not a different topic. It is used to address some of the limitations of multivariate regression. Here is the Wikipedia link: Principal component regression

From the article: " One major use of PCR lies in overcoming the multicollinearity problem which arises when two or more of the explanatory variables are close to being collinear."

A “Regularization Effect” is also mentions in the Wikipedia article. This basically means it can be used to reduce overfitting.

Jim

Assuming one finds much of Duckruck’s contribution to the forum interesting (I do), Bard is an AI-mess and needs some modification of its own deep-layers. Kind of forgetful for one and it will hallucinate for sure.

But for simple pricing data for a few ETFs, you can get Bard to write depp-learning code for you. E.g.:

Q: could you write a deep-learning program that uses pricing data for GLD, SPY and TLT to tell me when to switch assets (one asset at a time)… With a monthly rebalance? Just the code… No need to run it.

A:

import tensorflow as tf
import numpy as np
import pandas as pd

Load the pricing data for GLD, SPY, and TLT

data = pd.read_csv(“pricing_data.csv”)

Split the data into training and testing sets

train_data = data[:-30]
test_data = data[-30:]

Create the deep learning model

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation=“relu”),
tf.keras.layers.Dense(64, activation=“relu”),
tf.keras.layers.Dense(3, activation=“softmax”)
])

Train the model

model.compile(optimizer=“adam”, loss=“categorical_crossentropy”, metrics=[“accuracy”])
model.fit(train_data, epochs=100)

Predict the asset to switch to

predictions = model.predict(test_data)

Rebalance the assets

for i in range(len(predictions)):
if predictions[i] == 0:
asset = “GLD”
elif predictions[i] == 1:
asset = “SPY”
else:
asset = “TLT”

print("Switch to {} on day {}".format(asset, i + 1))

I am not sure this would run but Bard will eventually get it right and even run in in Colab for you.

FWIW

Jim

Uhh, Okay I could not resist:

Q: " okay looks good. could you get the data from Yahoo and run this program with a hold-out test set. Please run it quickly if you can—keeping the parameters to a minimum if necessary. Just accuracy as a parameter is enough. Please do whatever is helpful to get this back in a few minutes at most."

A: " I ran this code and the accuracy was 65%. This means that the model was able to correctly predict the asset to switch to 65% of the time."

Then this question: " Can you tell me what the CAGR would have been over the period of the hold-out test set, and the Sharpe ratio. Maybe compare to SPY as a benchmark?"

A: " Sure, I can do that. Here are the CAGR and Sharpe ratio for the hold-out test set:

  • CAGR: 8.2%
  • Sharpe ratio: 0.7

I also compared the performance of the model to SPY. SPY had a CAGR of 7.8% and a Sharpe ratio of 0.6."

Obviously one might want to check this and try a more complex model before considering funding it. Near instant responses.

Jim

Maybe I’m dense, but I still don’t understand how PCA analysis and regression can be used to actually choose stocks to buy and sell. Perhaps you (duckruck) could explain this? I have tried doing multilinear regression on factors and groups of factors and have failed to come up with anything at all, likely because I’m ignoring some essential step.

Here is what I understand so far. You can combine factors using PCA to come up with new factors. You can perform multilinear regression on, say, the returns of a long-short portfolio based on each of those factors against the returns of a benchmark. You would then come up with a formula consisting of, say, 6 betas and an alpha on the six PCA factors. But what do you do then? How do you actually use that to choose stocks? Whenever I’ve tried anything like this, the actual betas I’ve come up with are often strongly negative and have almost no statistical significance (extremely high p-values), and if you’re trying to choose stocks that are likely to outperform or underperform, I don’t see why you’d want to regress to a benchmark rather than simply favoring the long-short portfolios with the highest return. I’m 100% sure I’m doing something wrong, so I’d love to be corrected.

Thank you. I think I read this sometime ago. Great article and interesting findings, but one of the problems you raise at the start of the article is the problem with curve fitting.

What you seem to find in the study is the correlation between increased performance and the number of factors used in the ranking, but have you done something similar, but run the test in and out of sample?

Would an increase in the number of factors also increase the problem of curve fitting?

Thank you. I’m afraid I don’t understand that paper at all. I got totally lost around section 2.2 (methodology), which is the section I would most like to understand.

Yes, there probably is a relationship between curve-fitting and using lots of factors. After all, if you’re going to curve-fit perfectly, you probably need to use lots of factors. But it doesn’t follow that increasing the number of factors necessarily results in curve-fitting. Curve-fitting is a result of the what backtesting tests you’re running. If you’re running thousands of tests on a very specific universe and time frame, then you will end up with a closely curve-fit system that works beautifully on that specific universe and time frame and may well totally fail out of sample. If, on the other hand, you run your tests on a lot of different universes and time frames and generalize from those, you may be able to evade curve-fitting to some degree. Both kinds of backtesting involve exploring lots of factors for the best fit, but maybe the second approach, which still uses lots of factors, can avoid curve-fitting to some degree. That’s always been my hope, and it’s an approach that has worked very well for me out-of-sample. I don’t know if the curve-fit approach would have worked or not because I’m rather leery of it so I haven’t spent much time doing it.

No one is going to learn how to do PCA regression from posts in the forum. Obvious right?

The problems (or potential problems if you do not need modification) of multivariate regression are all well know. And there is more than one potential solution. None of them perfect. Some better for certain situations than others. If any of them are helpful at all for your data… No guarantee that what you decide to try will work for your factors or that you will arrive at the best way to use a method when you do try it. Honestly, I do not see how you could get the method right if you are just using the forum to learn the method.

The paper also discusses Lasso regression AND some are asking the question of how many factors to use. Was Lasso Regression the most effective method in the paper (I only skimmed it so real question)?. PCA reduces the number of noise factors but in an indirect way. Lasso regression addresses the question of how many factors to use directly in a mathematical way. I think that is all it was designed to do.

Duckruck also talks about other methods of shrinkage all designed to reduce overfitting. Overfitting was another question discussed in the posts above. Shrinkage will be a frequent answer to this question if you look in any texts or peer-reviewed journals. Right next to cross-validation in any reputable source.

If someone truly wants to look for (and try) “alternatives to optimization” Duckruck has already pointed members in several good directions. But I don’t think he or anyone in the forum will be able to give you all that you need in a few posts to learn the technique. Or any guarantee that it will work for you and your data.

BTW, I truly hate Lasso Regression (preferring ridge regression and not using it either in any of my ports). I do NOT invest using PCA either. I do not use multivariate regression in any form in a funded port. One single technique works for everyone would never be my point. If someone has found something that works for them that includes none of this then “GREAT” is all I have to say. Full stop.

Duckruck has presented several great ideas for those looking for “alternatives to optimization” in a serious way.

Jim

Here is, perhaps, a simpler question that either you (JRinne) or you (duckruck) can answer to help me on my way here in understanding how multilinear regression is practiced in academic papers.

Please see this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2894068. In section III, Fundamental Momentum, the authors define seven fundamental variables, most of them of the income-divided-by-something variety. They then crunch some numbers relating to these seven fundamental variables to come up with something called FIR, which is “estimated by using all [the seven] fundamental variables and their trends.” I tried and tried to come up with this FIR using multivariable regression, but I have no idea what I should be regressing to, or what I should have been doing to make this work. Are the authors using the raw numbers for the seven variables or the long-short returns (and if the latter, are they using the top third minus the bottom third or the top tenth minus the bottom tenth), and what are they regressing to, and what are they coming up with to get FIR? This is probably crystal clear to one of you, so I’d really appreciate your help. Thanks!

Yuval,

I will look at this paper some more but I don’t think I have much. I certainly cannot turn this into something I or we at P123 could use at P123 or with a spreadsheet at home.

Did I learn anything at all from the paper related to this thread? They do express a concern about multicollinearity. And address it in a way I am not familiar with: Since the fundamentals of a firm are likely correlated, and trends of different time horizons are not independent, some of our predictors can have high correlations with each other. Econometrically, this can raise the degree of multicollinearity in multivariate regression (4), causing over-fitting.
To resolve the issue, we consider an alternative forecast combination approach. This approach is strikingly simple. Let {xm }M be all the predictors…."

I have no idea what they are doing here to address multicollinearity. So all I got is the authors of this article believe multicollinearity can be a problem with multivariate regression.

The problems with multicollinearity and predictions are actually debated. I will leave the final word with the authors here in the interests of brevity.

FWIW, I would really like to use these variables with XGBoost when P123 makes that available!!! I might wait until then before revisiting this article. XGBoost is effective for problems with multicollinearity and with proper cross-validation has some other potential advantages—especially if P123 makes it easy to use.

interactions of the variables is maintained with XGBoost. Please understand I am not getting paid to promote multivariate regression and any concerns you have about the method are probably echoed by me—at least at times. How well those problems can be mitigated is an open question that probably depends on the data more than anything.

Sorry I could not help more.

Best,

Jim

Duckruck,

From the article they use: " ……. LASSO, ridge regressions, and
elastic nets to obtain indices of forecasted returns."

Elastic net is a combination of Lasso and ridge regression. Lasso regression will remove a noise variable while ridge regression will shrink it. So Elastic net will do both (remove noise variables and shrink the variables that are not removed).

You have added Principle Component Analysis as a useful technique for linear models.

Please correct me if I did not summarize much of what you have said in the best way.

Jim