How is restated data handled in P123 to prevent look-ahead bias?

hello,nice threat.

and any idea or thought and how this could affect im discrepancies in simulation before 2019 for example? +-%? Slightly I guess

This thread is very insightful and helps a lot. Thanks for the transparency.

2 Likes

A lot depends on rebalancing frequency and whether you're simulating US mid-to-large caps or other stocks.

In a simulation, ranking is done only on Sunday night/Monday morning. So if a company announces on a Tuesday, there's already a good lag before the ranking, and you're probably OK. If it announces on a Thursday or Friday, though, there may be a bit of look-ahead bias since--if it's not a US mid- or large-cap--FactSet sometimes takes a whole week to process an announcement, particularly during the height of earnings season.

There are various ways to mitigate this. What I do is set my simulations to rebalance every two weeks instead of every week (in real life I rebalance two or three times a week). This helps ensure a bit of a lag for most stocks.

1 Like

This is Yuval’s idea that I use in the universe to lag the data (or exclude stocks that FactSet might be delayed in updating): Min (LatestActualDays, DaysSince (Max (LatestFilingDate, LatestNewsDate))) > 3

Effective for the situation in Yuval’s post, I think. And you can change the number of days depending on how long you think it takes FactSet to update its data.

Wait what am I missing, how can there be look-ahead bias post-2019 if you guys are using snapshots of the database???

If you train on data pre-2019 and then validate / test on 2020 and later, it requires no lagging? You're letting the model train on possible look ahead bias but then totally denying it the ability to look ahead out-of sample. So if your model trains pre-2019 and works well post-2020, you're in good shape? Or possibly I am missing something?

I was talking about pre-2019 simulations, not post-2019.

1 Like