Are those “35 factors” ranking systems better because they use fewer factors? Or because they have better factors or more weight on better factors in the OOS?
Regardless, it would very helpful in evaluating designer models if p123 displayed the number of nodes in the ranking system and the number of degrees of freedom used in the buy/sell rules, e.g. number of factors or functions used in the buy/sell rules.
Personally, I’m less concerned with overfitting in the ranking system than I am with the buy/sell rules. Those boolean filters make it very easy to curve fit, and I tend to prefer systems with only single rank-based sell rule and zero-to-minimal buy rules (maybe just some use of {Sec,Ind}{Count,Weight} to avoid overconcentration.
Currently I have to rely on the designers’ text descriptions to get a sense of their rule constructions, but it would be incredibly helpful to expose these degrees of freedoms to potential purchasers to get a look behind with the curtain without fully exposing the implementation.
Better yet, I’d still prefer to be able to subscribe to a ranking system (and not a strategy), so that I could combine that ranking system with my own, or build my own trading rules around the ranking system based on my trading preferences.
If you want to know the true degrees of freedom you might consider an orthogonal factor analysis.
This may seem like it is in the weeds because it is. Degrees of freedom is a difficult concept. At least for me personally I have a math degree and used it frequently. I can honestly say I never fully understood it. But could use it in a “cookbook-like way” mimicking my professor or the textbooks when necessary. I got good grades.
So bypassing a bunch of notation and mathematical proof that I admit I do not really understand anyway. You will have trouble finding more than 8 to 10 significant “latent variables” in an orthogonal factor analysis. I think this is a measure of the true degrees of freedom most of us have based on the number of factors alone.
Granted, adding factors one at a time and optimizing (and overfitting) the individual weights DOES increase the degrees of freedom. As do Boolean buy rules as Feldy suggests. How you weight the factors adds to the degrees of freedom. As I said, difficult stuff and that is my only point about degrees of freedom if you want to use it.
It is not my point that we do not manage to increase the degrees of freedom and overfit at P123. We do.
But ultimately, the degrees of freedom is not closely related to the number of factors by itself. And for sure adding a factor that is highly correlated to other factors in s ranking system does not count as a whole degree of freedom
Personally I use all good factors (with the devil being in the details of how you define a good factor). Something like elastic net regression or cross-validation can address the issues of collinearity and overfitting once you have selected what you think are good factors.
TL;DR: I am not sure degree of freedom helps this discussion much. And anyone wanting to take the time to to understand it (most of my adult life so far) could probably understand cross-validation. And just me perhaps, but I believe they will find cross-validation more useful. Probably a hell of a lot easier.
@feldy As Yuval has mentioned in the forum, any factor with a weight below 2% is noise, and that has also been my experience. If you follow that rule and had 50 factors than no factor would be above 2%. Some factors are clearly better than others for example MarketCap may be between 5% and 10%. Given you will have several of these “better factors” that warrant >2% allocation, you cannot have 50 factors. I don’t know what the optimal number is yet but I suspect its between 25 and 35. Of course the quality of your factors is the most important part. Its important to draw a distinction between factors and ranking system nodes. I’m really talking about nodes here, each of which may contain multiple factors.
I consider that 4 factors in 1 node. If you have factors that may be missing, you can use a relevant default value if it makes sense for your particular factor.
Yes, since percentages are set in RS nodes. I don’t know of a way to weight individual factors within a single node, although that could be interesting.
I fully agree that 35 factors can work. No disagreement there.
But I wonder a little bit about what should count as a factor or possibly what Yuval’s present position is on factors and noise. Specifically these are Yuval’s words describing Crazy Returns Microcap Model: “The ranking system relies on over 150 different factors,…”
Anyway, whatever Yuval might say about what he is doing with his model I do agree with you. I also think the key is using good factors and not counting how many. Yuval spends a lot of time finding good factors. I am not questioning his methods either (no matter how arrived at the 150 number).
A facinating discussion. And for some reason, when searching in the forum, there have been some of them before, but without any clear answer.
How many nodes do you use in your system abwillingham?
And maybe others are willing to share how many nodes they are using?
I am one of the people who are overdoing it. I have 112 nodes and even more factors, but some are overlapping each other. I have been using the last 10 years to optimise and the 10 years before as an out-of-sample period. And for some reason, my out-of-sample period is not bad. With a 25-stock portfolio, a turnover of 300- 350%, and variable slippage of 0.2 commission, I have 65% insample and 61% out of sample.
I have a lot of nodes, about 60-70, and some of these have really low weights. My thinking has been that it’s probably harmless to finetune your system, as long as it’s done in the ranking system, the testing/optimizing procedure is “robust”, and the nodes make financial sense.
That being said, a while back, I made a ranking system optimized solely for the utility sector. Given the small universe, I decided to stick to node weights in multiples of 2,5%. The final number of nodes was much small than what I’m used to, but the clarity and the low effort required to optimize the weights really made me reconsider my many-nodes strategy. It’s tempting to return to my main system and try to make something simpler. Maybe next year!
I optimize ranking systems using multiples of 2% or 2.5% per node, and then I average the weights of the ranking systems that perform best on different subuniverses, so I end up with lots of factors and pretty low weights for the ones that don’t appear very frequently. Also, think about this: to measure earnings yield you can look at the current year’s estimate, next year’s estimate, the current quarter’s estimate, next quarter’s estimate, trailing twelve months EPS, last year’s EPS, last quarter’s EPS, actuals, and so on. And you can compare that to companies in the same industry, sector, and universe. Giving you about 50 or more different ways to measure earnings yield. Assigning some of those a small portion of weight rather than assigning all the weight to one of those will give you a more rounded picture of the company’s earnings yield.
Ok, so now that I know how to play, here’s my input.
One of my longest running models uses a 16 node ranking system. The Live, out-of-sample annualized return, over the past 5years+3months, is 35%. It holds 20 stocks.
Maybe I should look into adding more nodes. Or maybe not. It strikes me as being unusually robust.
EDIT: I’m starting to wonder about the size of the smallest effective ranking system.
I have been using one ranking system for all of my ports since October of last year. There is no survivorship bias (over that period at least).
Here are the stats on my median port as far as returns. It has 29 nodes. This is for the Weekly Excess Returns from the Statistics download from my port without modification uploaded into JASP. This is a one-sided Wilcoxon signed-rank test because stock returns are not normally distributed:
Conclusion; I have some statistical evidence that 29 nodes DEVELOPED USING STATISTICS AND MACHINE LEARNING ALONE can work. But I am actually looking to add more (good) nodes over the coming year. I am not advocating a smaller is better approach here. This is just sharing the data I have access to (without a comparison model other than the benchmark, of course). I have no data comparing 29 nodes to any other number of nodes here. My data makes not comment on that and I don’t claim it does.
Analyzed another 5y+3m (started 09/17/2018) Live model. This time, with 25 nodes, the out-of-sample annualized return is 42%.
This exercise was very helpful. The two ranking systems I mentioned are very similar. The larger one expands on analyst expectations, and value metrics (the Pr2something type).
Do I understand you correctly that reducing the number of nodes seems to give more robust systems, resulting in less volatile and better out-of-sample results?
Is there any way of testing this, or is there some statistical explanation for why increasing the number of nodes also increases the risk of overfitting and worsening the out-of-sample results?
I understand and respect that you are not a big fan of statistics and machine learning. But you ask about a “statistical explanation.”
I think Georg is 100% correct in this. I have only attempted to add that degrees of freedom is a complex thing and you cannot simply count the number of factors to get the degrees of freedom. And this with caution: you can keep the impact of adding a factor to a minimum.
But also, adding a factor and optimizing will add to the degress of freedom and overfitting. That is a fact full stop.
BTW, Yuval’s method of what he has called Model Averaging above is one excellent way of mitigating this problem. So I am actually in the camp that more factors (done right) is better, I think just looking at the number of factors is not going to work for this complex issue.
But I think Georg answered you question in the very first post.
In my case, increasing the number of nodes increased the annualized return.
But there is one big difference between the two systems; the one with higher AR has 10 holdings (vs 20). Sorry I didn’t notice this earlier. Kinda negates the comparison.
I’m not sure if this will help anything in the discussion, but I took random RS with less than 30 and more than 10 nodes from 2008–2009, and in the same way, I downloaded RS with more than 50 nodes from the same period. I ran 80 simulations, 40 with a low node count, and 40 simulations with a high node count. Every RS is set to neutral in the ranking method.
The simulation ran with the same settings: 25 stocks out of sample of 2013–2023 on a US and Canada univers with a minimum liqudity setting.
The RS with the fewest nodes won out of sample. But the overall result from most of the RS are not that good.