Update on AI factor normalization upgrades

marco · December 4, 2024, 11:31pm

Sure. But with the new release you will be able to create 23 different normalizations versions of the same metric. With around 200 metrics, each with 23 versions, you end up with 4000+ features. That's too much for an AI factor.

Here's a sample of the 23 different options for a valuation metric like EarningsYield:

Normalized vs. dataset
Normalized vs. all stocks on the same date
Normalized vs. stocks in same sector on the same date
Normalized relative to past 1Y, then vs. dataset
Normalized relative to past 2Y, then vs. dataset
Normalized relative to past 3Y, then vs. dataset
Normalized relative to past 1Y , then vs. all stocks on the same date
etc. (23 combinations total)

But the sheer number of possibilities is not the actual problem. After all, some of the inspiration is coming from the "Real Time ML" paper mentioned in this thread where they test 18,000 different combinations.

The main complexity is that there's no easy, scientific way to choose a subset of those combinations for an AI Factor.

We're brainstorming a web-based, scientific tool to quickly generate statistics like alpha and t-stat for a metric. It will be a tool outside of an AI Factor, so it can also be used for selecting metrics for the classic ranking systems.