Do AI Models Use Raw Absolute Values or Relative Ranks?

Curtisbaker3 · January 2, 2025, 12:53pm

This is probably 2 questions:

What does the AI use as the variable its running regressio and such on?
Is there a way to use absolute values in simulations (not as buy or sell rules though, but still as weights).

Explanation:
Does machine learning at p123 use raw/absolute metrics (e.g., a P/B of 1.0 versus 1.5), or are they using ranked/relative metrics (e.g., 99th percentile for P/B)? If the models are only using rank (i.e., “this stock is in the top decile” rather than “this stock’s P/B is 0.8”), might they be missing the magnitude differences between stocks?

In general in P123, is there a way to incorporate absolute values or magnitude in factor-based investing and simulations?
If in a model you have two adjacently ranked companies and one of them has a PB, PE, Price to cashflow that's all say triple as good as the next ranked company, but ranks say only 50% on another different metric, it would probably rank it as a worse becaue the one metric drags it down significantly. In other words - wouldn't it be beneficial to also include the magnitude of factors somehow? Not just look at whether a factor for a company ranks high, but also if it significantly better than the previous company in the ranks, then award it extra points/weight?

I would think ranking systems are more susceptible to overcrowding as the average value of one decile's factor could drift as more people buy into that decile, whereas weighting absolute values shouldn't drift or at least it will give less weight to certain values as they become less favorable over time.

For example, in the time period from 2000 to 2007, quantitative value investing become extremely popular and popular value metrics became overcroweded. The distribution of PE ratios in the market narrowed significantly, to the point where buying many value stocks based on PE was almost the same price as buying a growth stock with a similar PE. A rank system using PE would likely still rank stocks the same as in 2000 as 2007. Subsequently, growth stocks tended to outperform, as you could buy a much healthier, higher profit margin, etc, company for almost the same PE anyways.

What if instead we assigned weights to different magnitudes of a factor? I'm not sure what that's called. Would that be basically running a regression and then coming up with coefficients to plug into a formula to calculate returns from each factor? Does anyone do this?

Maybe this is synonymous with factor timing? I'm not sure.

marco · January 2, 2025, 9:15pm

Our AI Factors use normalized features for training and inference; either as percentile ranks or z-scores. If you want to incorporate magnitude use z-score. We trim outliers to 2.5σ but you can go as high as 10. Also, with the latest AI Factor upgrades, you can control each feature normalization (it also allows you to skip normalization altogether).

Our "classic" ranking systems are based on percentile ranks with a choice on how to rank NAs. We're planning to revisit the classic ranking systems and formally add z-score as an option for ranking.

Regardless of where they are used, there are a pluses and minuses to each scaling method. We're working on tools to analyze each method. But for now it's mostly done empirically.

Interesting idea. This is not currently possible with ranking systems. With AI factor, you can retrain the predictor which changes the weights of each features. BTW, you can empirically find the ideal retraining period by running cross validations with different number of folds.

With AI Factor you can normalize using the entire dataset. This should be used carefully as it can introduce look-ahead, but for stationary factors like PE that are range bound (they are, aren't they?) normalizing against the dataset might give you the desired effect.

We're working on a tool to calculate alpha for individual factors en masse, and give you, for example, the top 20 factors that are least correlated.

Curtisbaker3 · January 3, 2025, 12:40am

Perfect! Thanks so much for the thorough answer Marco - this was exactly what I was looking for.

I wasn't sure what the z-scores were for. It sounds like that would be the solution I am looking for. I look forward to testing that out in the AI when I get a chance.

Hmm, good to know - that will be interesting to test out the new tool for finding low correlation factors.

I'm curious to see how z-score will impact ranking systems once that's setup as well.

bobmc · January 3, 2025, 4:11am

[quote="marco, post:2, topic:69881"]
We're working on a tool to calculate alpha for individual factors en masse, and for example, the top 20 factors that are least correlated.
Super, thanks, just what the doctor ordered! My version has been a disappointment so far.