Live strategy different than simulated strategy of the same date

What does the simulation represent? What kind of data gets added intraday? Or is it just slow updating of fundamental data?

Simulation only has access to weekend fundamental data which includes the early Monday AM data.

Live rebalances get the latest intraday data which depends on what update has run.

During the week , the first update at 1030PM gets the prices for the day and whatever fundamentals and estimates the vendor gives us. The second update around 0330AM usually just gets whatever fundamentals and estimates the vendor updates. But they could fix anything at all too, including prices.

The weekend updates are similar to the second update; meaning usually fundamentals and estimates and anything at all that they fix.

Might be helpful to think of simulations as proof of concepts, instead of a precise recreation of the past using a time machine. In other words, simulations are one possible result of many of your strategy.

Regardless , I will proceed with the analysis I mentioned before. Store the precise values for your European ranking system and compare them next week.

Except they are not a proof of concept, they have a look ahead bias. THey will always include the Monday intraday update but live may not. The error is not random, the sim will always be better than live. Big problem unless your analysis shows the problem is not the intraday update, or if I had started live at 8am or 9am then it would have captured the intraday update before markets opened.

Phil, the sample size is too small to draw conclusions. In any case the sim is “correct” for a Monday rebalance @ 7AM Central time, or 2PM in Europe. So yeah, not ideal for Europe.

I think an update around 11PM on Sunday should do the trick and capture most of the updates the vendor does in the weekend for Europe between Saturday and Monday 3 AM. It’s a planned upgrade . We’ll try to get it out for this weekend.

Sorry about that. Turns out we do a Sunday update at 10PM CST. So the difference between a Simulation and a Live Europe rebalance before 4AM CST is only data that is updated by the vendor between those 6 hours.

Could still be significant. We’ll verify just how much changes between those times.

Out of 48 holdings in the live strategy, only 38 overlap with the sim… I don’t think the rankings can change that much in an intraday update?

The holdings are usually not the result of just one rebalancing, but of a few. Also, depending on your sell rules, such a large difference between the holdings does not seem implausible to me.

There have not been any rebalances yet, this difference from 48 to 38 is just the initial purchases.

Also this simulation is fairly low turnover, maybe turning over 1 or 2 stocks per week. Even if there had been a rebalance it still wouldn’t account for the differences.

And even after a rebalance, the holdings should be the same, because the rebalance should be the same.

@philjoe Sorry for the delay. I have some extra info that affects Europe and also North America. I downloaded the ranks and raw values for your ranking system last week on Monday 12AM then at 8AM. (let me know if you want the full spreadsheets)

This is what I found out

There are many differences in overall rank. From what I can tell it’s mostly due to analyst data which your ranking systems has lots of. This makes sense since analyst data is constantly being updated: analysts update their data and FactSet process it real time (our dataset is not update live, I think it’s every 4 hours or so)

If you compare the Final column and #NAs there are no differences. But I did find some differences in some fundamental factors. Not sure how to explain this except that we’ve seem FactSet update items in strange ways. Perhaps they are revisions. In other words it takes a while for data for a period to settle.

So the main question is if a simulation is using information that was maybe not available before the market opened. It’s certainly possible from this specially with analyst data. To be safe I think the best course of action is simply to cutoff analyst data for the weekend on Saturday for backtests.

For financial data we’re going to introduce very soon a way to lag financial data. For example lag it 1 week, 1 month, etc. BTW, academic studies typically use annual data lagged 6 months from period end date.

Thanks

Ranks Nov 7.xlsx (139.0 KB)

Marco,

I am not overly concerned about this. My ports on auto and my sims seem to look pretty similar when I check now. I use P123 every day and I think it is a good value.

But this question about analysts has come up so many times over many years that it is unbelievable that P123 does not say: “This problem with analysts had been noted and address in this way. Then proceed to say how it has already been addressed.”

And it would not hurt if you were able to say that P123 continues to monitor this in some way.

I thought we were lagging the analyst data for FactSet because that was a known problem that we had finally addressed (after many years). But also is there any reason to think this is not happening with CapitalIQ which for sure has no lag? Assuming there might still be a problem with the FactSet data (which I an not really sure about now).

So I guess if I have a question it would be are we actually lagging the analyst data still and if so we still need to cut it off on Saturday for backtests? The lag is not accomplishing this already?

And is CapitalIQ—with the way that they handle time-stamps—different? If not, does it need a lag and/or need to have that data cut off on Saturday? All just questions, maybe the lag is all that we need . Maybe CapitalIQ had a different time-stamp system. And like I said I have not noticed a big problem.

If you want to draw a large Kaggle crowd with the AI I think that will be the first question some of them will ask. The quality of the data and concern about look-ahead bias or “information leakage” is aways on their mind: to the point that they usually require their cross-validations to always involve data from the future. The word OCD does not cover it. And they are likely to attribute a good backtest to problems with information leakage (when not sure about the data) and not sign up, until that has pretty much been proven to them. Remember there is that theory about efficient markets out there and some will be skeptical (as they should be). They have not been running ports on auto for a while to know the answer to this.

FWIW.

Jim

Hmmm now that I think about it I thought Yuval looked into this before and had already introduced a 1 day lag? I’m going to start another comparison on Monday. Is there a way of doing the Saturday cutoff before then so I can test it?

The differences between sim and live is just too big, we have to do something…

Here is another example of a problem that I can’t figure out. Yesterday my live strategy recommended a buy AGS with a ranking of >95. Today it says sell as teh ranking has dropped <90. There hasn’t been any news and according to the database, the new quarterly report was reflected over the weekend. But was it? Or was it a buy yesterday because the new quarterly report hadn’t been reflected yet, and now that the database has updated overnight to reflect the new report, its no longer a buy?

Update: the plot thickens. Still rank 99 in teh “rankings” tab but you can see the live strategy is 85. Same ranking system, same universe.


ags2

Ok here is even clearer proof that something extremely odd is happening between sims and live strategies.

I created a live strategy called “experiment - live micro US” yesterday around 1pm.

I create a sim strategy called “experiment - sim micro US - run Dec 5 at 1pm” yesterday around 1pm, made by copying the live model, so exactly the same in every way.

The holdings for these two are identical, i downloaded and saved the excel files of the holdings for each.

Today at around 3 pm, I re-run the sim strategy with the exact same everything and call it “experiment - sim micro US - run Dec 6 at 3pm”, and the holdings are totally different. Only 28 out of the 50 are the same. The only way this is possible is if the data in the database that is timestamped as Dec 4th was different yesterday than it is today, which means the database is being updated retroactively and hence is not point in time.

Phil -

I think we need to drill down to figure out which factors in your ranking system are causing such a difference. I’ll look at the ranks of every stock and every factor using as-of dates of 12/5, 11/28, and 11/21. I’ll see how those ranks change between today and tomorrow, using those same as-of dates. That should give us a clue as to which factors are causing the changes. It will also give us a clue as to whether this is a Monday-Tuesday problem or whether the problem is also Tuesday-Wednesday, and whether the problem equally affects recent data, week-old data, and two-week-old data.

  • Yuval

That will work for debugging the problem, but the bigger picture is that you guys claim to have a “point in time” database that supposedly takes snapshots on weekends to be truly point in time (which is a great idea because it even captures for the time it takes Factset to update the database). But regardless of what factors cause it, the system allows the database to be updated after the fact. That ability to update after the fact 1. should not exist because it isn’t necessary and 2. seems to be introducing serious look-ahead bias.

Tuesday vs. Wednesday was 100% the same, so it must be a Monday - Tuesday problem.

At the end of the day staying with P123 can be a matter of faith. I called to cancel my membership once because of this problem. But no refunds if you paid for the year already. Changed my mind by the time my membership was up for renewal. But still paper-traded for a while. Investing some money now.

Is this really a problem that will turn a great sim into a dog port? Who knows. Not me. Best guess: not necessarily.

Still, Philip seems to be finding a pretty significant problem (concerning). The only word we have from P123 so far is from Marco’s post where he seems surprised that sentiment data has a problem—even though it has been noticed before multiple times in the forum. A problem still after the lag was introduced for analysts’ data, if I read Marco’s post correctly.

Anyway, if my port does not perform well out-of-sample it will not be the usually excuse (overfitting) this time. Cross-validation and my methods in general will make it clear that data is the problem (if there is a problem). The SIM works fine out-of-sample since 2015. What the port will do is a different question.

I am not saying I know the answer. And again, my best guess is that it is not always a problem. Just saying it will take a while for me to know about my present port. And that for some people the problem seems to be real (supported by this thread including P123’s comments).

I am sure we will get a better understanding this time. Doubt about the data is not that good for attracting the Kaggle crowd would be my guess. But there have been a lot of people talking about a lot of new methods at P123 so there is real potential for P123 to grow—especially if this is found to be a non-problem or any problems are addressed.

My recommendation: keep the new members coming and address any uncertainty about the data that might slow the growth.

I’m not interested in the blame game or saying P123 is a bad service (its not). I am only interested in drawing attention to the issue in order to fix it or if not, allow for it in simulations.

I have noticed in every single live portfolio, performance has lagged hypothetical performance even after adjusting for slippage… so I am investigating every possible cause as clearly its more than just bad luck.

Philip -

What follows is quite tentative, and I might be slightly off on some points. But I didn’t want to leave you hanging.

I believe there are several entirely unrelated issues going on here.

First, we have been getting some “phantom” NAs in cash-flow-statement data immediately after filings, and our fallback mechanisms failed because of the way the data was being reported. We wrote to FactSet to see if they could fix this, and they couldn’t, so we are going to be implementing a better fallback for these instances. These NAs would show up and affect live strategies but would not be in simulations since the NA data is generally overwritten. This explains why simulations would perform a little better than live strategies. In general, there’s not much we can do about the fact that FactSet silently corrects NAs and mistakes after the fact.

Second, I think the reason the simulations were so completely different on Monday and Tuesday has to do with the starting date of the simulation. I may be mistaken, but it seems that the simulations were relying on different closing prices. This would completely explain the differences, since you’re using a lot of factors that rely on closing prices. I was able to find that running the Ranks tab of your ranking system with a 12/5 as-of date on 12/6 and on 12/7 give very different results because different closing prices are used for the technical factors, and I suspect the same sort of thing is happening with the simulation differences. At any rate, I think the problem is chiefly related to using different prices for ranking rather than to FactSet changing the data. Thanks for calling this to our attention; we’re doing some more investigating and will fix the problem soon.

Third, there may be a bug with using Close(0) in an Industry Formula node. This would affect your “Sentiment: Industry Momentum” node. I tried running that node alone today and got all N/As, while running it on other dates was fine. If a live strategy got all N/As for that node but a simulation got real data, that would contribute to a simulation outperforming a live strategy. (I’m not saying that’s what’s happening, but it’s a possibility.) So that’s something else we need to investigate/fix.

Perfect, thanks for looking into it.

First problem: worst case we can fix this with sme eval() functions?

Second problem: not quite following, is it because I use “Price” sometimes and then I use “Close(0)” in other formulas (I assumed they were interchangeable)? Or are you saying that the simulation run on Tuesday uses the Monday close as Close(0) instead of last Friday like it should? I can confirm that in the screener if you choose to rebalance every day, this definitely happens, where it pulls last daily close instead of the last weekly close. But then the Tuesday vs Wednesday difference should be just as big, no? Also depending on whether I’m understanding this problem it wouldn’t necessarily contribute to underperformance of a live strategy vs. a simulated strategy, so long as the sim and live are rebalanced on Mondays only?

Third Problem: while this would contribute to underperformance of live if this factor is always N/A, its not a big enough weighting to explain it all.