alternatives to optimization?

Yuval,

Yea, Random Forests can do automatic feature selection and you can get a printout of this to know which features are useful.

Ultimately, I do not think everyone can learn this or that it would be my responsibility to teach it to people. A bit above my pay scale.

Honestly, I do not think the management of P123 should be asking you to do this either. I do not think anything will work other than letting people go at their own pace with Python and maybe share some code for those that are interested.

I do think there will be a lot of people signing up that do already understand this, however, if they could use their own methods with Python. They would be teaching both of us.

-Jim

Hi Steve, I wanted to say thanks for sharing this “squeeze” idea. It gives me something I want to try to work on. When building an approch from the ground up it’s easy to feel the diminishing marginal returns, and I sometimes wonder how often that small backtest improvement I see is actually an improvement or whether it’s just random fluctuation. (Somewhere on here I posted an example of adding a random series to a model showing examples of getting small beneficial improvements just from chance, so what you’re saying resonates.) The squeeze you discuss gets ruthless in pruning anything that isn’t clearly helping. I think I’ll give it a go and compare with results of the fully dressed out models.

Again, thank you for sharing this idea. Appreciated.

1 Like

Yuval - people have this evil connotation of what optimization means. It is certainly possible to put all of your 100 factors into one RS and tweak until you get the ultimate backtest result. This is, of course, one way of optimizing and is certainly prone to overfit. But what I suggest is not to tweak for the best backtest result, because that is simply memorization of past results and will likely fall on its face going forward. Instead, I suggest you perform a process where the objective is to minimize the number of factors, without significantly degrading the results with an eye to monotonic increase in rank per bucket. You keep adjusting the node weights, but some nodes will approach zero weighting. You take the opportunity to prune these nodes and start again. Typically you can end up with perhaps 25% of the original factors and still have the same or better backtest performance than when you started. Think of it as an exorcism, where memorization (excess factors) is the demon and the RS is the occupied body.

You are welcome SpacemanJones. You must understand of course that when you follow this process, you will get taken down a different path each time. Your optimal RS for Run #2 will look nothing like Run #1. A factor may get deleted from run #1 but be the highest factor weighting in Run #2.

I don’t get this, Jim. Earlier today you wrote, “A P123 member could go far using a type of Kernel Regression (LOESS) on these uncorrelated nodes . . . if we had the tools each node could be optimized using Principle Component Analysis (PCA). . . . Depending on the situation, multiple regression, Kernel Regression or Random Forests MAY be reasonable methods. There are other methods. I have tried them all and I use something else now—for the most part. Random forests can work well. LOESS works VERY WELL but is computer resource intensive. Multiple regression with feature selection (e.g., LASSO) works very well.”

But now when I ask you to illustrate how any of this can be done, you say it’s “above your pay scale.” What does that mean? How are you going to convince folks when you’re unwilling to explain how these techniques could be implemented? You have used all of these techniques yourself, so why can’t you tell us how you’ve done so?

You have to keep in mind that to the uninitiated, all these terms sound like magic dust. I know that they’re not. I know that they are real techniques that smart people use. But if you want people to take you seriously, you need to explain these concepts and how they can be implemented. If you can’t do that, I suggest you not intimidate readers with them. It’s like writing comments in the forum in Ancient Greek and refusing to translate.

I don’t get it either.

There is nothing in this view-thread that one can actually use to improve design of strategies. The whole discussion has no practical value, at least not for me.

What you guys need to do is to provide a “recipe book” of how to implement ideas you may have, so that mere mortals like me can actually get something beneficial out of it.

I spent a good amount of time answering your question in the forum. I will say that again that many of the terms were first used buy you in the forum.

It is my opinion that they should not be foreign to you in your position at P123.

In any case, I am going to request that you not be so rude. I truly had no idea that this was all greek to you.

-Jim

Georg,

You are the one who has recommended AIC for P123. This is one of the methods of “feature selection” I alluded to above. Damn smart if you ask me.

Convenient that you forget this. I will find the link if you truly cannot remember. Well in fact here it is:

But I am not interested in you using my ideas (or even your own). I do think Python could be used by many—probably not you it seems. Not here anyway, just on your site I guess. You do sell your statistical techniques over there for sure.

-Jim

I have read through a lot of posts today (all the way back to 2008) on recommendations on the number of nodes that should be in a good and robust system.

The problem - the answers vary a lot. Most agree that reducing the number of buy and sell rules, and having nodes in the ranking systems that makes sense, but then it stops.

Some say that by reducing the number of nodes (8-10) reduces the possibility of overfitting, others say to have as many as possible as long as it makes financial sense, preferably nodes that represent all the factor themes.

Does anyone have a good answer here? I tried looking into the DM models, and the ranking systems that are available to the public, and comparing them to the in- and out-of-sample periods. But that did not provide a good answer.

Most of the ones that had great performance over a period of 10 years ( more then 10 stocks, turnover lower than 600%) had less than 18 nodes. But much of their out-of-sample performance is worse than the market.

I think parallel analysis is a good tool for determining the number of components when doing PCA or FA. This actually compares the information in a component to the noise in the data in objective way.

ChatGPT describes this better than I could: "Parallel analysis is a statistical method used to determine the number of components or factors to retain in techniques such as principal component analysis (PCA) or factor analysis (FA).

In PCA or FA, one of the challenging decisions to make is determining the number of components or factors to retain for further analysis. ………

Parallel analysis provides a robust solution to this problem. It works by comparing the observed eigenvalues derived from the PCA or FA with those obtained from a randomly generated dataset of the same size. The randomly generated data acts as a kind of benchmark, representing the amount of variance that could be accounted for by chance alone."

Uh…I had to drop out of physics at UC Berkeley because I did not fully understand what an eigenvalue was or how it related to Fourier Transforms or even why I was supposed to care. To be honest I am not sure I every really got it. :thinking: :worried: So just a thought.

Jim

For their paper and their factors?

I did not look at their factors. If I were to try PCA again I think I might at least look at parallel analysis.:wink: or not.

For clarity variance explained is not necessarily related to the effect size. There is universal agreement on this point. So I don’t really know ahead of time for a specific set of factors. At least parallel analysis gives an objective answer. Other than someone else’s findings for a different set of factors.

If anyone—including me—wants a definite answer they would do cross-validation with the number of PCAs being a hyper parameter for their data. Some of that is easy to say, a month to do sometimes.

Barring that, P123’s servers can handle a few extra PCA’s which is often the reason people really care about this.

To be clear about what you are advocating you don’t even look at a scree plot on occasion and don’t consider that 5 may be too many sometimes?

Jim

Very cool that you are doing scree plots and other things.

Nice and I agree 6 is usually MORE THAN enough.

Jim

BTW, can I ask if you use a correlation cut-off? Even if you do use a cut-off you must encounter factors with a negative coefficient in a principle component. If so how do you handle those?

Thanks.

Jim

I don’t know much about PCA analysis, but isn’t it a completely different approach than using multifactor ranking? If Whycliffes’s question was about multifactor ranking, does PCA have any relevance?

Whycliffes, I tried to address this question about five years ago in this article: The Two Types of Investing or Trading Errors - invest(igations)

Not different but a sometimes-used addition. Actually pretty often used I thought from my readings.

Advocated as a solution for the problem of multicollinearity usually.

You basically always, and I mean always, have problems with mis-specification or multicollinearity and in practice both when doing multivariate analysis. There are solutions and Duckruck is kind enough to share one of them.

He is obviously well informed. Boosting addresses this the issue of multicollinearity in a whole different way. I guess Rikki is working on the XGBoost and I am not sure that I actually recommend either method,.But Duckruck clearly advances the discussion here.

I get that eigenvectors are not that easy. Believe me, I get that. :smiley: But I am not sure they were not trying to got me to learn it for no reason in school.

Anyway thank you for your comments Duckruck!!! They do belong here I believe.

Jim

If you want to invest in, say, the top 25 stocks according to a variety of fundamental factors, one option is to use a multifactor ranking system. Another option is, I assume, to use PCA instead. Do I have that right? How would you go about using PCA for this task? I know the basics: you would standardize all the factors, create a covariance matrix, and then use eigenvectors to combine the factors into principal components (though I’m a little unclear as to whether a specific factor contributes to several principal components or only one). Then what?

Can be more than one. It is often more than one I think. And sometime a factor will have a positive weight or “loading” in one principle component and a negative loading in another.

PCA regression.

I am not sure how Duuckruck is using it. But PCA regression is very common. PCA regression is an extension of multivariate regression and not a different topic. It is used to address some of the limitations of multivariate regression. Here is the Wikipedia link: Principal component regression

From the article: " One major use of PCR lies in overcoming the multicollinearity problem which arises when two or more of the explanatory variables are close to being collinear."

A “Regularization Effect” is also mentions in the Wikipedia article. This basically means it can be used to reduce overfitting.

Jim

Assuming one finds much of Duckruck’s contribution to the forum interesting (I do), Bard is an AI-mess and needs some modification of its own deep-layers. Kind of forgetful for one and it will hallucinate for sure.

But for simple pricing data for a few ETFs, you can get Bard to write depp-learning code for you. E.g.:

Q: could you write a deep-learning program that uses pricing data for GLD, SPY and TLT to tell me when to switch assets (one asset at a time)… With a monthly rebalance? Just the code… No need to run it.

A:

import tensorflow as tf
import numpy as np
import pandas as pd

Load the pricing data for GLD, SPY, and TLT

data = pd.read_csv(“pricing_data.csv”)

Split the data into training and testing sets

train_data = data[:-30]
test_data = data[-30:]

Create the deep learning model

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation=“relu”),
tf.keras.layers.Dense(64, activation=“relu”),
tf.keras.layers.Dense(3, activation=“softmax”)
])

Train the model

model.compile(optimizer=“adam”, loss=“categorical_crossentropy”, metrics=[“accuracy”])
model.fit(train_data, epochs=100)

Predict the asset to switch to

predictions = model.predict(test_data)

Rebalance the assets

for i in range(len(predictions)):
if predictions[i] == 0:
asset = “GLD”
elif predictions[i] == 1:
asset = “SPY”
else:
asset = “TLT”

print("Switch to {} on day {}".format(asset, i + 1))

I am not sure this would run but Bard will eventually get it right and even run in in Colab for you.

FWIW

Jim

Uhh, Okay I could not resist:

Q: " okay looks good. could you get the data from Yahoo and run this program with a hold-out test set. Please run it quickly if you can—keeping the parameters to a minimum if necessary. Just accuracy as a parameter is enough. Please do whatever is helpful to get this back in a few minutes at most."

A: " I ran this code and the accuracy was 65%. This means that the model was able to correctly predict the asset to switch to 65% of the time."

Then this question: " Can you tell me what the CAGR would have been over the period of the hold-out test set, and the Sharpe ratio. Maybe compare to SPY as a benchmark?"

A: " Sure, I can do that. Here are the CAGR and Sharpe ratio for the hold-out test set:

  • CAGR: 8.2%
  • Sharpe ratio: 0.7

I also compared the performance of the model to SPY. SPY had a CAGR of 7.8% and a Sharpe ratio of 0.6."

Obviously one might want to check this and try a more complex model before considering funding it. Near instant responses.

Jim

Maybe I’m dense, but I still don’t understand how PCA analysis and regression can be used to actually choose stocks to buy and sell. Perhaps you (duckruck) could explain this? I have tried doing multilinear regression on factors and groups of factors and have failed to come up with anything at all, likely because I’m ignoring some essential step.

Here is what I understand so far. You can combine factors using PCA to come up with new factors. You can perform multilinear regression on, say, the returns of a long-short portfolio based on each of those factors against the returns of a benchmark. You would then come up with a formula consisting of, say, 6 betas and an alpha on the six PCA factors. But what do you do then? How do you actually use that to choose stocks? Whenever I’ve tried anything like this, the actual betas I’ve come up with are often strongly negative and have almost no statistical significance (extremely high p-values), and if you’re trying to choose stocks that are likely to outperform or underperform, I don’t see why you’d want to regress to a benchmark rather than simply favoring the long-short portfolios with the highest return. I’m 100% sure I’m doing something wrong, so I’d love to be corrected.