AI Factor Dataset Download Parity

VSai23 · March 29, 2026, 4:17pm

Hey. I have downloaded my datasets for AI factors a few times form different factors. All have the same date range, normalization, filters, target etc. But each time the download differs. Is there a reasoning for this?

Is there a guide on how I can replicate the results I get on the P123 website in a local environment using the dataset download? Right now I am spending a good bit of time doing this type of research but there seems to be little to know parity.

@marco @judgetrade @AlgoMan

AlgoMan · March 29, 2026, 5:13pm

Have you done the downloads the same day or over several days?

VSai23 · March 29, 2026, 5:14pm

Over multiple months (due to large number of factors and large universe) - have each of them as separate parquets and have to join them

WalterW · March 29, 2026, 5:18pm

Can you describe what's different between the downloads? For example, are you seeing different tickers, different values, or different row counts and roughly how large are the differences? Are we talking about minor rounding variation or something more significant?

VSai23 · March 29, 2026, 5:21pm

Primary issue is missing tickers without any of the filters having changed.

Create an AI factor, download its dataset.
Create a second AI factor, different features but same universe and filters.
Train the second ai factor and download the predictions.
Intersect their tickers (the symbol names in the predictions vs the symbol names in the downloaded dataset) → they do not match and there is no subset or superset relationship → back track this to mean that the universes are not the same.

I would download the second ai factor as well but I’m out of credits lol as I have been trying to figure this out for a while.

WalterW · March 29, 2026, 5:25pm

When was the last Ai Factor pair created?

AlgoMan · March 29, 2026, 5:32pm

I have seen this as well when it's not downloaded the same day.
I did a deep dive the other day and it was only one stock that had entered to universe and another left, the market cap of one of the companies had been altered, that's all, it gave this massive butterfly effect over the years.
It adds a lot of extra data handling work when it happens, annoying and time-consuming.

VSai23 · March 29, 2026, 5:40pm

Last 4 days all side by side.

VSai23 · March 29, 2026, 5:42pm

Why would it matter if the its cross sectional normalization by week? Any suggestions on better par to par research on local devices given this type of behavior?

craigmediaservices · April 14, 2026, 5:16pm

I’ve encountered this as well, but for me it’s been stocks on the edge of my filters entering/exiting the Universe. I assume there are some updates to the historical data.