NEW 'Factor List' tool for downloading data for AI/ML

marco · December 24, 2023, 3:07pm

For those that want to use Z-Scores calculated with means and stddevs that span multiple dates:

Yes, it will be expensive in terms of API credits because you have to download the entire period using the same ML Training End Date so that the latest data uses the same distribution stats.

Yes, we’re not giving you the mean & stddev bc it would make it trivial to get the raw data and thereby violating our vendor terms (technical data is ok)

We might offer a way to remember the statistics for each feature so that you only need to download the latest data. But it’s not a solution for all cases, and probably a waste of time. What if you are doing k-fold cross validation for example?

Therefore just use Min/Max with a very small Trim% with lots of digits in the precision and do the Z-Score yourself. You will have total control.

Lastly the FactSet license is only $12K/year and you can download raw data.