NEW 'Factor List' tool for downloading data for AI/ML

I was thinking about this and instead of downloading the entire dataset again all you need is one week of overlap. I think this is true as the zscores are a linear transformation of the original data and therefore the different zscores are just a linear transformation of each other.

Using chatgpt I have a method for two datasets where there is one week of overlap with all the same stocks. I feel like there should be a way that only requires two or so identical stock ids, but I have not figured it out yet. I did use chatgpt to check the method, so take with a grain of salt!

For two sets: set1 and set2 with one week of complete overlap

  1. Determine the Scale and Shift Factors:
  • Scale (a): The ratio of the standard deviations of set1 and set2 for the overlap week.
  • Shift (b): The difference in means of set1 and set2 for the overlap week.
  1. Calculate the new zscores using this equation:
    z_set2, transformed = a * z_set2 + b

All together:
z_set2, transformed = set1_wstd/set2_wstd * z_set2 + (set1_wmean - set2_wmean)

“By applying this transformation, the z-score from set2 is adjusted to be on the same scale as set1, making it comparable across the two datasets for the overlapping period. This is particularly useful when you want to compare or combine z-scores from different sources that were standardized differently.”