# Ranking versus Regression and AI questions

I’m sure there’s a way to word this in less words but my brain’s having trouble…

Here’s my thought:
As I understand it, when we rank stocks the underlying metrics we rank them on are relative to the percentiles around it. My portfolio may average a PB of 1 this year but may average higher another depending on market conditions but also how all investors are investing. Studies show people tend to crowd the most effective strategies over time as education increases and more studies are released. This makes the distribution of a metric, say price to book, drift over time. Your cheapest 10% companies may have a PB closer to the cheapest 20% or closer to the average - (and that next 10% of companies likely has much better other metrics). So we’re back testing and choosing percentiles that have factor values that are always shifting around. Wondering if it would be better to create a regression that predicts the expected return using the actual value of the factor. I know we can do regressions of ranks, but again that rank may represent something quite different 10 years from now if many people crowd the strategy. This is also why many factors have a bell curve when we plot the returns I suspect - high EPS growth stocks have too much attention and the top 10% of them having stretched valuations from overcrowding.

Is it possible to create a predictive model that using factors and their underlying values as the X value and the annual return as your Y, without using ranks?

Is that the purpose of the AI? Is that the only way to create a predictive formula? Does P123 use ranks to predict or does it run regressions on the factor value itself?

Sometimes there are moments in history when a growth stock has almost the same valuation ratio as a value stock, when the distribution is very narrow - in those moments it’s probably better to buy a growth stock I imagine: for example figure 4 around year 2009/2010 Growth vs Value - Yardeni Research

Ideally I’d want to give a factor less weight on my model when distributions narrow, and more when distributions expand. If a growth stock has a similar PE to a value stock, I’d like to weight growth higher automatically then. I think a foolproof model would account for factor timing at least in the regards of valuations drift and overcrowding.

Building on this - Is it possible to add a node on a p123 model that checks what the average value for a factor is for the top 10% rank, then adjusts its weight accordingly?

You seem to be talking about feature dispersion

Based on the papers I've seen, this is not a good criterion for factor timing. But you can try it

If you are talking about Value Spread, then its actually not very useful as well.

The only relatively reliable factor timing criterion is factor momentum, but it still doesn't fit the vast majority of factors.

Simply put, basically only the pan-value factors as a whole are worth timing (GBAB, GHML, GQMJ). This can help you minimize the pain of the value factor collapse in '18 to '20 and the timely switch to embracing the value factor at the end of '21. But you'll be hard pressed to get more than that.

I think you may be looking for more than one thing and I could be missing some of what you are interested in. But I think you can get some of what you are looking for by scaling z-score or min/max over the entire period.

1 Like

You may also just want to compare current ratio to historical average ratio. I would look into the loopsum function. Also aggregate series which you can then apply a moving average to.

1 Like

Thanks everyone! Great thoughts!

@ZGWZ Hmm, okay I do see what you mean with the feature dispersion and value spread not being very useful. I finished an analysis of 327 factors and found that the ones that underperformed for the previous 10 years, had no better or no worse performance the next 10 years. In other words, the average factor alpha value always remained around the long term historical average, despite whatever the previous 10 years did. (I mean technically it outperformed by 0.30% above its usual historical average - but I think that's basically random chance at that level, given the average alpha per factor was around 3%, so about 1/10th the additional alpha).

^This is all assuming previous returns are an accurate measure of a factor's dispersion/distribution.

Its surprising how meaningless previous factors returns have.

I checked in the entire market as a whole using data from a study, but haven't checked in microcaps (couldn't find data for that) - probably similar findings.

Thanks Hemmerling and Jrinne - i'll try these ideas as well and see if that affects performance.

Yes, the only thing that looks valuable is the average performance within the sample. That's why I said I couldn't find a better feature selection method than the simplest method you could come up with.