Live vs Simulated - FactSet vs Compustat

I would like to re-open this discussion on live vs sim performance because I worry that we don't have a full understanding of it. First and foremost, is there a systematic bias of sim vs live (or vice versa), or should we expect the long term expected value of identical live and sim strategies to converge? If there is a bias, what are the root causes?

I think this is even more important now, because with the increase in ML methods for building ranking systems, if there are historical biases in some factors (like the sentiment ones discussed above), then the ML methods will find and overweight those factors even though they can't be fully replicated in live trading, potentially exacerbating any performance gap.

I started looking at this yesterday because I noticed that one of my strategies launched towards the end of 2023 has a live sim that is 11 percentage points worse than its simulated counterpart over 2024 so far (v4 in the plots below). It also happens to be the most sentiment-heavy of my models.

I then re-examined the previous models since the inception of the live strategies, and 3 out of 4 of my models have a 90+% probability that the average weekly returns of the live strategies are really less than that of their sim counterparts (to the tune of 4-9 percentage points annualized).




I'm not sure my next investigative steps at this point. I may run my analysis on the p123 live sims since they have more history, but I'm still brainstorming at this point.