I Get different simulation results with exact same system / timeframe

Jrinne · October 2, 2020, 10:52pm

Brett’s complaint was; " Of course we all want correct data to work with but if it is not PIT it is of no use to me."

For sure Brett’s main issue was things not being PIT. Obviously, the problem with this is that the sim can look pretty good due to the look-ahead bias but the port can be worthless.

He expresses that here:“I would love to place bets on Sunday football games the day after, it is just not realistic.”

Personally I do not understand what else he could mean. But I did say I was 99% percent sure because it was implied. Maybe you are right.

Chaim DID say it directly: “However since the switch to FS I am seeing huge discrepancies between live portfolios and backtested sims.”

We agree on what Nisser said.

But why would anyone be interest in a bunch of re-runs of sims unconnected to the performance of the port?

You are good at statistics. A broken watch will give you the same answer each time. As you know–statistically speaking–a broken watch has high precision. There are other things to consider besides precision (or backtests giving the same answer each time).

To spell it out, you ultimately want the sim to lead (NOT MISLEAD) you to the port that will make the most money. Precision is a small part of that. Not being PIT is a big part of misleading when it is allowed to exist (or even embraced in a quest for precision).

Chaim, Nisser and I think Brett understand this.

Best,

Jim

nisser · October 3, 2020, 3:19am

Here’s the problem in a nutshell: FactSet’s data is not PIT. They make revisions. And we don’t know WHEN they make revisions (i.e. corrections; restatements are a different kettle of fish).

So let’s say we followed the suggestions from several users on this thread and took snapshots of data daily and stored them and then let users decide which date’s data to use. (Obviously, this would be a huge task and would mean our data storage capacity would have to be multiplied by thousands.) Even then there would be plenty of non-PIT information in the backtests because of a) revisions that FactSet makes to final data and b) a lack of effective dates (we have no idea when FactSet actually delivered the data that it’s reporting to its clients).

If we decided to no longer accept FactSet data revisions, we would not be any closer to being PIT than we would if we accepted them because so many data revisions have already been applied. And we would run the risk of not being able to fix errors. And because the data is so interrelated, revisions of one data point will involve revisions to others. For example, let’s say we point out to FactSet that they got their share count wrong for a certain stock at a certain point in the past. They would then revise that share count, which would affect every single value ratio for that stock. The ripple/butterfly effect would be huge. Should we decline that revision? Apparently not. But what if another FactSet client points out that the share count is wrong and they respond in the same way? Should we decline that revision because it wasn’t generated by us? And how are we to differentiate between a revision that’s correcting a data error and a revision that reflects new information?

At the same time, we have been taking a number of steps all along to limit the number of changes that users experience and to make our data as PIT-ish as possible. This includes noting effective dates of announcements and statements post-March 2020, assigning past effective dates as smartly as possible, creating PIT time series, and so on. I’d be happy to consider other suggestions regarding making our data more PIT-ish. But please do take into account that the data we’re starting with isn’t PIT to begin with.

I do not want to dismiss or minimize your concerns. I just want to tell you how difficult it is to satisfy them, and how hard we’re trying to deal with this.

I think it matters how revisions are done. To elaborate on your example, if factset is “fixing” the share count on X date, my immediate question relates to if this information was widely wrong for everyone at the time? If everyone is playing with the same information at the same time, then there’s no issues.

If Apple has a PE ratio of 100 on July 1st, that’s only a p123 error (and not widely believed to be true), only to be later corrected to be a P/E of 1 on August 1st, then its very possible that my p123 system would have ignored APPLE in the live version but would have bought it in a backtested simulation thus juicing the backtested performance and giving me the wrong impression. I believe the latter is happening, which is completely unacceptable.

So what’s being “updated and revised” and how?

Edit: Can others share their simulated live strategies to see if they’re getting the same thing as me? Surely we all have live strategies going. Just simulate it, copying it exactly.
There are a lot of moving parts. Is the data revisions that’s the problem? Is it the universe that’s being adjusted over time, etc…I don’t know. But the huge discrepancy is concerning.

Jrinne · October 3, 2020, 9:01am

Nisser,

I agree with how important this is. Here is what I have now (the best I can do now).

But the port has been modified in a fairly significant way. It is does not follow the same rules throughout as the sim does.

I will start a new port Monday, follow all of the recommendations exactly and make no modifications until some conclusions can be drawn.

I hope to have one of many examples (including other member’s contributions) that people can trust and share as we go.

Thank you for sharing your results. Thank you for your comments.

People can find the impact of this on their own without being swayed by speculation, anectdotal stories or hypothetical examples in the forum. By following Nisser’s example.

Maybe P123 will be interested in some reliable data from a diverse set of models as P123 considers ways of addressing this.

Images. Port first then the sim.

Best,

Jim

Jrinne · October 3, 2020, 12:15pm

Yuval,

Nice scatter-gun approach!!!

My suggestion: Diagnose the problem (if there is a general problem).

And see how bad the problem is (stage the cancer if it is cancer).

P123 has has data on a LOT OF PORTS!!! They are called Designer Models.

Can P123 run those as sims without looking at any of the internals?

Figure out the median deviation of the ports versus sim, range of deviation. Outliers etc.

Maybe Nisser’s suspicions are justified and we will find out why the average Designer Model underperforms. It would not hurt the members (especially the subs) to find the reasons for that either.

But those statistics should be posted on the home-page as part of the Informed Consent. If the ports are as unconnected to this sims as some are suggesting is possible you have an obligation to let people know.

Test different possible diagnoses if there is a problem.

Is it the earnings estimates?

Maybe data revisions are switching factors from NA to a rank adding a huge amount of random noise in some of the ports. This would be useful knowledge to some members if it is causing problems.

This is not an exhaustive list. But an example where the treatment would be entirely different in each case.

P123 is rightly concerned that the treatment could be worse than the problem. No one wants radiation therapy for a freckle.

You need a diagnosis to get the treatment right.

Members can help by sharing their data. But P123 has access to data that we do not and P123 needs to step up.

Diagnosis the problem (if there is one). Take the appropriate steps (effective but not overly aggressive).

I think there is enough evidence with Nisser’s, Brett’s and Chaim’s posts to suggest you need to understand what might be going on.

Make a diagnosis is my suggestion. Maybe share the diagnosis with us. I get that finance is different, but everything above would be ethically and legally required in medicine. And financial security is still somewhat important in our somewhat capitalistic countries.

Best,

Jim

yuvaltaylor · October 3, 2020, 5:02pm

Jim -

I like this. It’s a good suggestion. But there’s one big problem: we switched data providers in June and then fixed a huge number of bugs. So live models and simulations based on those models will show a much more pronounced difference between now and, say, six months ago than they will between, say, six months from now and now. In other words, if we performed this experiment now, the results would be alarmingly different; if we perform it six months from now, the results will likely not be so different.

Maybe remind me in six months to do this?

Yuval

nisser · October 3, 2020, 5:29pm

I just looked at my US port. The live strategy was readjusted June 12/2018 so I’m only comparing from those date points.
First picture is the live port. Second picture is the simulated strategy copy.
Live results:
Strategy - 26%
Russell -6%

Jrinne · October 3, 2020, 5:38pm

Yuval,

Cool. Thank you for working on this.

I agree with this.

This would just lead to a false sense of precision. All the sims agreeing but “would not be any closer to being PIT.” Exactly as you say.

Can I share what I think this is? It would be good for everyone—especially P123—if I am right. And it would be easy to diagnose.

It could be earnings revisions but I think it is problems with ports that have a large number of NAs where the NAs get filled in later with the data revisions. So the ports are pretty random but the sims look pretty good.

You may be able to exclude that possibility immediately with your understanding of the revisions. Just an idea and I obviously do not know yet. But my uninformed guess would be that an NA is literally 100 times more likely to get filled in later than the likelihood that a piece of data would get changed.

And an NA can cause a lot of randomness not matter how it is addressed (even with eval fall-backs which members are probably not using too often).

One reason I think this is that my port (and sim above) are pretty close despite revisions I have made in the port. Perhaps because I went out of my way to exclude factors that have even infrequent NAs.

I could see the NAs were causing unacceptable randomness when I developed the port. And here the port is holding its own with the sim. Just a coincidence?

Anyway, excellent detective work. And I think the ultimate solution will reflect well on P123.

I suspect (if I am right) the data is pretty good. It would be (if I am right) our choice (as members) as to whether to buy a stock with a lot of NAs for the factors we choose.

I can see you are working on this and thank you for your comments.

Best,

Jim

ivanftp · October 3, 2020, 6:41pm

Could it be that the data was updated between your simulations? Sometimes data providers print the wrong price. They notice the mistake later on and will revise the value causing differences in your simulation results. Another reason could be the change in data providers. When I was building https://pyinvesting.com/ the simulation results changed after I switched data providers.

brett · October 3, 2020, 9:10pm

I think it matters how revisions are done. To elaborate on your example, if factset is “fixing” the share count on X date, my immediate question relates to if this information was widely wrong for everyone at the time? If everyone is playing with the same information at the same time, then there’s no issues.

If Apple has a PE ratio of 100 on July 1st, that’s only a p123 error (and not widely believed to be true), only to be later corrected to be a P/E of 1 on August 1st, then its very possible that my p123 system would have ignored APPLE in the live version but would have bought it in a backtested simulation thus juicing the backtested performance and giving me the wrong impression. I believe the latter is happening, which is completely unacceptable.

So what’s being “updated and revised” and how?

Yes Nisser, thank you and well said. What is the data I could have traded on at that time. That is what I mean by PIT.

Yes Jim, you are reading me correctly

Jrinne · October 4, 2020, 11:30am

Nisser,

Thank you again for sharing. I think everyone, Marco and Yuval included, thinks some difference between sims and the corresponding ports happens.

Keep in mind that P123 was doing a lot of changes with their data as they switched to FactSet data and these change are a type of revision of the data. A type of revision that has probably gotten much less frequent recently.

But it is just a question of magnitude. FactSet data being what we have to worry about now. Did we start using FactSet data in June?

No ports were using FactSet data before June or whenever the switchover occurred.

Looking at your equity curves how would you rate the difference between the sim and the port since we started using FactSet data?

By rate, I mean Bad Problem? Not too bad? Or not much data and over a period when the way P123 was handling data was changing?

Everyone should make their own judgements. It seems to me like everyone has different tolerances of how much change is acceptable. So I definitely cannot judge for anyone else.

Just deciding on what I will do with my own investing: I will not be switching to CompuStat data based on what I have seen so far. But I will be looking for more data and information.

Yuval said it could be done. If he is right about that he needs to compare Designer Models to the corresponding sims in six months. Do it over a period when the frequency of changes that P123 makes in the data has declined. Having a small difference between the sim and the port is the most basic assumption to the idea that anything we do at P123 might work.

If we are not confident in the assumption that the sims and ports are highly similar we should probably invest our money elsewhere. We should demand data on this (with regular updating) to make sure we are not making a bad assumption.

Yuval asked me to remind him so that he could implement this idea.He said he liked the idea. I have marked it on my calendar. [color=firebrick]If that will not work for P123, I think we should hear it now. Preferably there would be some explanation of why any outliers are doing what they are doing so we can avoid being an outlier with our personal ports.[/color] I do think I will still be here in 6 months and still be using FactSet data as my default position (null hypothesis) is that the data is generally okay and maybe P123 can even identify specific problems and make some improvements.

It would be a simple quality check when you get down to it.

That is DEFINITELY not to say that P123 should’t respond to other member’s ideas and concerns or that this is a settled issue (either way).

This is probably the most important thing. The sims have to have a high relationhip to the ports or the whole exercise is meaningless. The more discussion there is on this topic the better. And every incremental improvement is good.

I certainly hope we do not have to hear people at P123 tell us that look-ahead bias is actually a good thing because it randomizes the data or some such thing. Look-ahead bias is never good. Bias (of any type) is never good. Even in the workplace intelligent people seem to be able to recognize that without a lot of debate.

Best,

Jim

RTNL · October 5, 2020, 12:41am

+1000!

RTNL · October 5, 2020, 12:47am

What alternatives would you consider? And how much is it to link to CS for an individual while using the P123 engine?

RTNL · October 5, 2020, 1:15am

So here is an actual live port and now the equivalent port
First attachment is live port. 2nd is FS sim
the change in AR is about 4%ann and turnover increases

Jrinne · October 5, 2020, 10:05am

RT,

Thank you for sharing.

The data is what it is. If we have enough good data everyone can judge for themselves. I am not trying to sway opinions one way or the other. Other than to stress how important this is.

You joined in 2010?

That looks like the downturn in the 2008 recession but obviously I cannot tell for sure. Anyway, great returns!!!

Start date of the port aside, the differences recently (since the FactSet change) are HUGE!!!. Just a fact. No judgement.

That huge difference deserves explanation.

P123–just if it were me–I would take this beyond the anectdotal. Probably would be in you interest, I would think. Not that I haven’t already suggested this in the strongest terms possible.

Maybe inform people that they need to keep NAs out of their ports if that is all it takes to minimize the differences in sims and ports. Maybe put a lag in some of the data like Quantopian does if there is a need to do so.

Yuval, I will make sure to post that reminder in 6 months that you asked for above should you decide to produce some non-anectdotal data.

My default is to trust the data but I admit I am starting to get concerned. Maybe this concerns me the most:

Not useless. Good to know. But is suggests that data might not be what I assumed it was. Yeah, I need some information about how useful FactSet data is.

I am having trouble ignoring everyone’s post including Marco’s (starting with Andrea’s post). Some very smart people are posting, And it seem that I am the only one who is not already sure that there are problems with the data at this point.

I still think if P123 shared some non-anectdotal data and aggressively addressed any problems………Always the optimist, me.

Best,

Jim

RTNL · October 5, 2020, 11:53am

Jim,

Yes, I started in 2010. Have a been a casual user. P123 has been very value added for me. And I really appreciate what the P123 team does with limited resources. My friends who are professional investors with access to both FS and Bloomberg say that they see data discrepancies between sources all the time.

For me, P123 was a way to test Hypothesis and break them. So, I am OK to keep my live portfolios built with Compustat data as they are. However, I have trouble with the new sims, if the data is not PIT. The way I look at it, I would have made decision on that particular day in the past with data available to me (good or bad). Bug fixes are OK; data over writes really change the nature of my process, and negate the intent of my study.

So, I am really concerned.

I am not sure if I am explaining my view point well. Unfortunately, I was really busy during the transition time and could not extensively test all my models with both data sets.

Jrinne · October 5, 2020, 12:50pm

Jim,

Yes, I started in 2010. Have a been a casual user. P123 has been very value added for me. And I really appreciate what the P123 team does with limited resources. My friends who are professional investors with access to both FS and Bloomberg say that they see data discrepancies between sources all the time.

For me, P123 was a way to test Hypothesis and break them. So, I am OK to keep my live portfolios built with Compustat data as they are. However, I have trouble with the new sims, if the data is not PIT. The way I look at it, I would have made decision on that particular day in the past with data available to me (good or bad). Bug fixes are OK; data over writes really change the nature of my process, and negate the intent of my study.

So, I am really concerned.

I am not sure if I am explaining my view point well. Unfortunately, I was really busy during the transition time and could not extensively test all my models with both data sets.

RT,

What you say makes perfect sense.

BTW, there was no way to have tested a live port at the time of the transition. So what you are doing is perfect and you are early in the game.

P123 should help you with getting more information.

I am a Bayesian (sorry about the statistics reference).

But my (prior) beliefs change with new information.

My prior belief about the data changed a lot over the weekend. You can see how much by looking at my first post with Andreas. Maybe my beliefs will change again if P123 wants to provide data and information on this.

But I think I will sell all of my P123 port holdings and see how this shakes out. Get some data upon which to base a solid posterior belief.

I am not going to ignore everyone’s posts. ETFOptimized was the only one without a concern (including Marco) and ETFOptimize was working off of assumptions based on CompuStat data. SuPirate’s post was good but it was on an unrelated topic.

I think you are looking at this in the right way and you could not have done anything different during the transition.

And thank you for sharing. You helped me a lot by sharing. I can always buy-back after I collect data or if P123 wants to share some data. Maybe put a lag in some of that data if they come to the same conclusions that Quantopian did. Or address any problems they may be aware of and are able to do something about. I am not going to risk a downturn now around the election hoping that my port is based on solid data that will perform long-term. Hope is not a plan.

Best

Jim

yuvaltaylor · October 5, 2020, 2:46pm

I’d like to be as helpful as I can be here in terms of providing data. I am going to take a bunch of live strategies that have not been revised recently and rerun them as simulations with the same rules, using Compustat data before the end of June and FactSet data afterwards. I’ll have the results for you shortly. The results will still be anecdotal–there’s no way to do this en masse–and one must take into account the “butterfly effect” that Chris described earlier in this thread.

RTNL · October 5, 2020, 3:33pm

Thanks Yuval. That will be really insightful, and helpful.

Jrinne · October 5, 2020, 5:12pm

All,

The “butterfly effect” should not cause a systematic error. The ports should not consistently underperform the sims due to the butterfly effect. We can worry about the butterfly effect if we need to

I am sure Yuval and everyone at P123 knows not to exclude any data even if they find a reason (after looking at the results) to exclude it. Even if it seems like there is a very good reason. They should try to look at as many as possible without some sort of selection bias.

If there is a reason to exclude an outlier P123 can show us the outlier and provide the explanation.

This could be cleared up quickly with a propper handling of the data. P123 posses a lot of data.

My mind is open to whatever the data shows. I wouldn’t mind getting back into individual equities quickly. But for now I sold everything that was based on P123 data.

Thank you Yuval.

Best,

Jim

taofen · October 7, 2020, 1:56pm

I’m getting same issue. It’s very easy to reproduce it. You just save any of your live port as a simulation port and run to compare the their result. For my ports, I have a few new live ports started from exactly 28th June and rebalanced weekly till now. Sim ports’ results are 5~10% better than live ports. I remember the discrepancy is not so big just a few weeks ago. This is definitely issue and will mislead user. If this is some kind of look-ahead-bias issue, ideally PIT should be able to process each data point based on the time stamp from Factsets and the timestamp created/managed by P123 and thus overcome the issue during simulation process. If not, i think there are still some bugs in P123 or P123 can do better.