Update on AI factor normalization upgrades

marco · December 3, 2024, 12:52pm

Hello, about a month ago I posted about an impending release here: Preview of our v1.1 of AI Factor: intelligent features, grid search, categorical & macro - #15 by marco.

Sorry it's taking so long but time flies, and we're trying to make the upgrades as intuitive as possible. They are in the final testing phases. The main changes recently have to do with the way the different normalization options are presented.

See below for a preview. It shows some of the different normalizations for a feature "Earnings Yield" (EY). Currently only one of the highlighted features is possible: either normalize EY cross-sectionally (meaning vs. other socks) using the entire dataset, or within a particular date.

In the new version you will be able to easily do different normalizations of the same feature thereby communicating something different (and uncorrelated) to the AI algorithm.

For example "1Y Relative (N=26) + Date & Sector" is a two step normalization that first calculates a z-score (or rank) for EY vs. it's 26 previous values sampled every two weeks (so for a period of one year). It then normalizes that score cross-sectionally against peers in the sector.

What the above is communicating to the AI algorithm is this: how cheap a stock is relative to it's history, and how that historical cheapness compares to the cheapness of stocks in the same sector. This is quite different than saying to the AI algorithm if a stock is the cheapest in the sector.

Sounds too complicated? We realize that greater flexibility comes with greater complexity, but we think it will be worthwhile. AI algorithms benefit from uncorrelated features, and it's better to present a powerful feature like EY in different normalized ways rather than using different valuation metrics with the same normalization.

Well, that's the theory at least.

Let us know your thoughts.

PS also note additional options to visualize the feature. You can now quickly visualize the cross sectional distribution for EY as well as the time series for a single stock

korr123 · December 3, 2024, 10:24pm

Most AI models rely on the law of scale—the more data, the better. I’ve found LightGBM to be particularly effective for ranking systems, including here on P123.

I think this approach can be very powerful, and complexity shouldn’t be avoided. While I understand the concern about scaring users away from adopting new technologies, most users here, along with Yuval, are very helpful. Sharing positive examples of how these tools improve investments is a great way to encourage adoption.

I’d love to test the beta. My first goal would be to create data for regimes, such as VIX levels, the yield curve, and other leading indicators. Ideally, the system would identify factor importance based on the prevailing macroeconomic climate.

I hope readers read the pain I went through with value stocks over the past decade

marco · December 4, 2024, 11:31pm

Sure. But with the new release you will be able to create 23 different normalizations versions of the same metric. With around 200 metrics, each with 23 versions, you end up with 4000+ features. That's too much for an AI factor.

Here's a sample of the 23 different options for a valuation metric like EarningsYield:

Normalized vs. dataset
Normalized vs. all stocks on the same date
Normalized vs. stocks in same sector on the same date
Normalized relative to past 1Y, then vs. dataset
Normalized relative to past 2Y, then vs. dataset
Normalized relative to past 3Y, then vs. dataset
Normalized relative to past 1Y , then vs. all stocks on the same date
etc. (23 combinations total)

But the sheer number of possibilities is not the actual problem. After all, some of the inspiration is coming from the "Real Time ML" paper mentioned in this thread where they test 18,000 different combinations.

The main complexity is that there's no easy, scientific way to choose a subset of those combinations for an AI Factor.

We're brainstorming a web-based, scientific tool to quickly generate statistics like alpha and t-stat for a metric. It will be a tool outside of an AI Factor, so it can also be used for selecting metrics for the classic ranking systems.

Jrinne · December 5, 2024, 10:26am

The Benjamini-Hochberg (BH) procedure might be applicable here. In genetics, researchers routinely face similar feature selection challenges when analyzing genome-wide association studies (GWAS) with tens of thousands of genes. BH helps them identify significant genes while controlling the false discovery rate - that is, the proportion of false positives among all rejected null hypotheses.

However, there are two key considerations for financial data:

BH works best with independent tests, while financial metrics (like your 23 normalizations) often have complex dependencies
For dependent features, the Benjamini-Yekutieli variant was specifically developed to maintain FDR control

This could be particularly relevant when dealing with your multiple normalizations of the same base metric, which would likely have strong dependencies.

Marco, this would be ideal and could be automated. I believe. I think the paper's results would have been improve by removing the obvious noise introduced by adding that many random factors and by reducing the dimensionality. 18,000 is a lot of dimensions if one has any belief in the curse of dimensionality at all. See a discussion of curse of dimensionality in ML here.

I get that the paper's authors think that their results are not as good because others were doing it wrong and they have now corrected all of that, but maybe the authors are making some mistakes in their own paper or have some poor assumptions accounting for some of their INFERIOR PERFORMANCE (e.g, a lot of noise and some pretty large dimensionality in their method).

In other words, I am not sure we should go out of our way to get the same inferior performance that the authors claim is a feature unless we fully understand what is going on and avoid simply accepting this single paper as being exhaustive on the subject.

TL;DR: I think you could reduce computer time as well as time spent with spreadsheets while improving results with this well-established method.

This previous post provides a link to the BH procedure method which is simple enough that it can be done in a spreadsheet or as the post suggest, you can have ChatGPT do it for you: Everyone can do machine learning using ChatGPT - #18 by Jrinne

The BH procedure is automated in an Sklearn module also: SelectFdr

The BH procedure is discussed in many finance papers linked to in the forum by members. Many have linked to this paper that discusses the BH procedure (among others) for example: Is There a Replication Crisis in Finance?

korr123 · December 5, 2024, 9:04pm

Don’t many AI models inherently ignore features that don’t contribute useful information? For example, while traditional regression models can struggle with increased dimensionality, techniques like Lasso or regularization effectively eliminate features that don’t add sufficient value.

Additionally, models like decision trees and LightGBM implicitly handle feature selection. Decision trees evaluate features based on their ability to reduce impurity (e.g., Gini Index or Entropy) and only use features that meaningfully improve splits. Similarly, LightGBM builds ensembles of trees and chooses features based on their contribution to minimizing the loss function. Irrelevant features are rarely, if ever, used in these splits.

One advantage of AI factors or traditional ranking systems is their ability to combine individual features that may not be significant on their own but become meaningful when used together in a model. This is a common concept many of us have encountered or at least read about.

Maybe I'm misunderstanding something?

Jrinne · December 5, 2024, 9:32pm

To some extent. But 18,000 will be a problem in my limited experience. There will be noise features that the model will fit to with 18,000 features and probably more noise than signal with that many features. People can find their own answers on which of their factors to keep or a general idea of how many factors to use with P123 excellent cross-validation tools.

But if someone wants to reduce the number of factors a little—even to reduce resource usage— this would be one possible method that has very wide acceptance.

Until Marco gets those analog in-memory compute cards, I am not sure I have time for 18,000 factors. I'll want something to reduce the number of features if I start using 23 normalizations for each feature. Marco has expressed an interest in finding a "scientific method" above and this is a method that has wide acceptance that anyone can consider and use if it fits their needs.

If one takes the time to read past the abstract in papers about feature selection, BH is commonly discussed. And as I mentioned above, it can be done with a spreadsheet.