How to include macro variables?

How are we to include things like the level of vix into the AI Factor tool. It seems there are two ways to preprocess the data and after reviewing the downloaded dataset it's still unclear to how include this.

Like, I think we'd want to cross sectional z-scores (by date) for stock values on each data but we'd also want the vix z-score normalized over the historical or lookback period.

This way, we can see if the algorithms can figure out patterns of factors depending on the macro state.

Any guidance from @p123?

Similar to Categorical Variables ie: Sector or Industry this is not available yet. Will be added soon. Thanks

ty @marco

In order to use macro data with cross sectional stock data would you be creating individual betas to macro factors or something else?

It depends. For example, when using the VIX and tree modeling, you want to see if the algorithm can identify factors that significantly gain or lose depending on the VIX. Neural networks might be able to do something similar.

Regression modeling can approximate this, but it has the drawback of requiring a shorter training window. Did I explain that clearly?

Our initial approach to macro features is simply to normalize the value over a historical lookback period. For example where is gold trading vs the past 5 or 10 years. Macro normalization will use the same preprocessor used for stock features, whether it's z-score or rank for example. This way the algorithm is fed all similar values.

The above is impossible to do right now.

However, "Individual Betas" is possible now using BetaFunc, provided the time series of the stock aligns perfectly with the macro time series. Lots of macro series have different frequencies and BetaFunc is not smart enough at the moment to align them. It should at least give an error if the time series are incompatible, but instead it quietly calculates garbage betas.

In other words, if you plan to create your own individual betas, best to ask us before using BetaFunc so we can check the series.

TY. Understanding the limitations is very helpful to understand what everything is doing @marco . Just to be sure: The normalization "over the entire period" is done looking backwards (i.e. on the date the model is sampling the data), and the "by date" is done only on the day of the sample?