RE: Historical results changing? Please look at this...

Thanks Marc and Marco for all of the work both of you have done improving the quality of available factors, sims, screens, etc. Everything you improve or fix simultanously improves the quality of all of our attempts at backtesting and forecasting, regardless of the changes in simulated AR.

Steve, in our discussion you argued simulation (in-sample R2G data) is nothing and should not be presented to R2G users (Investors) at all. So in other words investor is NOT interested what have changed in simulation and why. So do designer from investor point of view too.

I asked you [quote]
Steve, why do you worry about “the results for one of my models was negatively impacted according to my simulations”? This is just in-sample and underlying changes was for good. This is just simulation. Who cares? Investors should not be interested. You do not objecting to changes made and you know the background. So why?
[/quote] and get an answer [quote]
As the designer of the model, knowing the context of the backtest (i.e. what, why, how, etc), I can scrutinize the simulation. A potential subscriber cannot as he/she doesn’t have the background for the development of the model. Also, the model has been lackluster ever since the change. So is it a question of the model “as designed” being lackluster, or “as modified by P123”? I can’t tell because I don’t have access to the original industry factors.
[/quote] If you see simulation changed without your interaction, so there is no question if the model has been lackluster “as designed”. It is definitely because of P123 modifications. Am I wrong? Then we have only one argument left: “I can scrutinize the simulation”. But please, educate me, why do you need this and what will you do with this then you find out? Note, you DO KNOW “WHY?”, because of the changed calculations. What ELSE do you need to find out and what FOR?

I would propose to make old factor available as Steve asking for for comparability. But all the data proceed with old factors going public (including R2G) should be labeled as calculated with wrong/broken/etc. factor. Let the investor decide how treat such a model. Personally I will never invest in.

As for simulation presentation, just a stupid idea form an investor: designers should not have control to what is displayed as simulated results to the public. All the simulations are just for designer internal use. When model is going public (R2G) P123 performs it’s own backtests as was proposed in the thread. Then simulation could be presented in probability bands, for example outcomes with 90% probability out of performed backtests with different variables and factors permutations. Each factor can have it’s own predefined range for permutation for all the models to put into equal environment. The more volatile factors, the more number of factors model use, the wider outcome band it will probably produce. OOS data can be shown as band ranged form the top and bottom of simulation outcomes band. For sure it should be explained to investors and made on the best effort basis. Still simulation is just a past result and have not much predictive power.

Looks like attached image, sorry for design. OOS blue lines - OOS actual band of 90% probability, there can be some median, OOS green line - benchmark.


Outcomes probability.png

Marco, I love it. Could be also applied to r2gs.

I felt into the trap to follow models that reacted very sesitive to “HighLowAverage” and performed only very well to the open Price.
rgs did help me here because they obligated me to go for a more realistic calculation.

Thank you and grettings!

Regards

Andreas

Marc,
just read your Posts. I do understand them and I follow your conclusion to fix what needs to be fixed, because the Axiom is being falsified.
Everything else would not be scientific, at least not to Edward Popper and for me also to common sense. Really like you Mission!

The Thing is in relative Terms you guys walk above water (compare that to other funky sites as 99% “not point in timers” like zacks and co) and express my respect for this.

The future is bright :slight_smile:

Regards

Andreas

I originally understood this change to be based on preference; or even an enhancement to an existing set of factors. After re-reading the responses from everyone on this post I think I have a firmer understanding of this issue. It now sounds as if this is more of (and I’m sticking with the software narrative here) a bug that has been corrected. And it absolutely makes sense to ‘fix the bug’. When I read Marco’s original post, the idea that this was a bug escaped me. My apologies for fanning the flames.

Konstantin, my friend -

“…Note, you DO KNOW “WHY?”, because of the changed calculations. What ELSE do you need to find out and what FOR?”

How do you know why the simulation results changed? The answer is simply that you don’t know. Neither does Marc, despite all the ranting and raving. No one should ever presume that this changed algorithm is responsible for the changed simulation results. This is why I need the old factors, so I can make that determination. If it turns out that the changed algorithm is not the sole cause or primary reason why the simulation results have changed then further investigation will be required to get to the root cause. Once the root cause is understood, then I can decide what is in the best interests of my subscribers. I am effectively being paid by them to get at the right answers and make the right decisions, otherwise I don’t deserve their business. Do you agree?

There is nothing that disturbs me more than people not interested in getting to the bottom of a problem.

“Let the investor decide how treat such a model. Personally I will never invest in.”

So here is the problem. The day an R2G model is launched, the in-sample data is inaccurate. This is by admission of P123. By how much? Well who knows, someone tossed out the figure of 30%. Is that accurate? Well what if it changes by 30% every year? Basically you should never invest if you plan on studying in-sample data.

"…designers should not have control to what is displayed as simulated results to the public. All the simulations are just for designer internal use. When model is going public (R2G) P123 performs it’s own backtests as was proposed in the thread. Then simulation could be presented in probability bands, for example outcomes with 90% probability out of performed backtests with different variables and factors permutations. Each factor can have it’s own predefined range for permutation for all the models to put into equal environment… "

I can pretty much guarantee that this will fall flat right from the get-go. For one, it has already been admitted that backtests are not repeatable. Thus any kind of judgement based on backtest is erroneous. Second, in all likelihood the rolling backtests will not be performed in a correct manner (I made this statement before). The reason is that ports don’t start up correctly and never have. If a port has to make 5, 10, 20, stock picks at once then it is reaching very deep into the depths of the ranking system. This is quite different than selecting 1 or 2 stock picks every rebalance period. The rolling backtests will have to be done in a fashion such that the first x months are tossed out, with x dependent port turnover. If turnover is very low, then perhaps the first year of results may have to be thrown away. If the turnover is super high then perhaps the first month should be tossed. If P123 addresses the repeatability problem and performs the rolling backtests as I have outlined then there is a chance that it might “work”, but what it really means is shifting from one ideology to another. A different method of displaying data will make certain development strategies look better in the eyes of potential investors, just like the original display of in-sample data gave an advantage to certain models. The new display will deal with the obvious over-optimization, but not displaying in-sample data also does, but in a more fair fashion.

I hope this answers you question,
take care,
Steve

"And if this were an engineering, mathematical or scientific platform, that would be a huge problem. But it’s not. It’s an investment-strategy platform, and if we were to not understand how to approach that, then you’d have cause to worry. "

Marc - I hate to break the news to you but it is firstly a s/w platform, secondly an investment-strategy platform. And there is really good cause to worry if you guys don’t understand s/w… AND I really hate this part (due to all of the squawking that is about to happen), but I have to say, I’m having trouble getting the words out, but, … quantitative analysis is part of “investment strategy”. Otherwise there would be no point to ranking systems, would there? In fact, you aren’t doing a particularly good job selling the capabilities of P123. If sims aren’t accurate (certainly not point in time) and you only care about value strategies with absolute rules, then why do you need more than a simple screener with fundamentals? Even a screener backtester is probably more than you need. Heck, you can find lots of these on the internet for a lot less cost than P123.

Or, I can save P123 a whole pile of development time and money… just replace the simulator output (graph, stats and all) with a big picture of “Thumbs Up” or “Thumbs Down”. Then below it, just put up a statement of warning: “WARNING: this simulation should be re-run on a regular basis because the status of the thumb is likely to change over time.”

“The “platform” is impressively stable and I’m really baffled by why that is being called into question.”

So last week, I had a colleague copy a simulation that I designed exactly. I got this message back from my colleague: “I’m seeing sharply different return data for a 5 year test… 24.78% vs 31%”. Review of the trading system summary showed identical simulations in every respect.

A few years ago, I had designed an S&P 500 system only to watch the port sell off all of the holdings over a few months without buying any new stocks. While this was certainly possible according to the buy/sell rules, I was scratching my head because the simulation continued to buy and hold stocks. It wasn’t until a year later that the real cause came out along with the familiar “sorry about that”.

Do I need to continue?

The point I am trying to make is that P123 is first and foremost a S/W platform and requires the discipline that any S/W platform deserves, for investigation and resolution of issues. Investors lose money with every S/W bug, or do you think otherwise?

Steve

Steve, while I do appreciate your friendship, I would like to get to the bottom of a problem, so… [quote]
How do you know why the simulation results changed? … Once the root cause is understood, then I can decide what is in the best interests of my subscribers. I am effectively being paid by them to get at the right answers and make the right decisions, otherwise I don’t deserve their business. Do you agree?
[/quote] WHY do you need to know why the sim results changed? I mean you know WHY in general and BASIC, because there are a lot of thing, out there, behind the window, there the real life with all it’s uncertainty and fallacy exist. There are human error, fraud, revisions, market impact, wrong assumptions in the uncertain environment. What exactly you need to explain to your subscribers? I am as investor don’t need explanations on any single database correction! Why do I need this if I know this is investments, there no and never been any certainty! It is OK to expect past changes due to a billions reasons called human beings. OK, you can explain this to subscriber 1,2,3,4 times, but then… I just can’t imagine who will be interested in this. Why? OK, you explained, so what? Are there any actions you can make to correct this? NO! Because this was correction, your sim went from wrong state to correct one. Do your investors need/ask to get back to wrong state? No! Why do whey need this? To match sim results while knowing they are wrong? Investor know of investment environment uncertainty and expect this.

This is why Investor do not interested if your sim repeatable or not. Investor know exactly, this is utopia. From math point of view it is interesting task, PIT data and blah, blah, blah and certainly everything is possible. But what is the reason for such an efforts? Who and why needs your explanations how the reality is changing? Don’t point on investors, they DON’T! Isn’t it you, Steve, who argued not to present simulation to investors at all? So if sim not presented is good, why do investor should be interested in sim details?

The only question I can imagine from investor side is: why the f$%^ your sim/model/R2G have changed NEGATIVELY so greatly because of SINGLE SMALL POSITIVE correction? And this is the question of model quality (curve fitting/optimization/data mining) and this question is not about simulation and data, it is about YOUR MODEL. Am I wrong, my friend?

Gentlemen,
Can we maybe dispense with the hostilities and keep the discussions in this forum civilized… please? I would like to remind you that backtests are not the only ‘quality’ measure perspective RTG customers can use to determine if they want to subscribe or continue to subscribe to a designer’s model(s).

Let me point out why I’m not able to understand half of this discussion. Whether you are in the camp of being for optimization or against it, most agree on one thing: it is important to know why something works!

I’m agnostic as to how to get there.

To the extent that there might be super curve-fitters trying to sell ports that won’t work out of sample–I get it. Many of those don’t post much anymore. There is a thing or two I may not know on this–so I will avoid this.

For me personally–I eat my own cooking–there is a lot that I have discovered through serendipity that turns out to have a rational reason for working. On the other hand there is a lot that makes so much sense that turns out to be worthless.

I don’t want to miss the next penicillin which had no rational explanation at the time. Nor do I want to try to prove what seems so obvious: that bees cannot possibly fly. Denny may have an opinion on this. I do want to explain something once I see it. That leads to something even better: penicillin as an example again. I did use it once last year (still the best for strep throat).

I feel Steve’s concern. Results should be repeatable. Differences in results should be trackable to a change.

There is a cost/benefit to P123 to implement this and I think they have done a decent job of explaining what happened. A more detailed comparative test and analysis would be time-consuming and expensive, but would be appropriate if you are investing millions, and then only if the shift in results merited the cost of time and effort.

5 stock portfolios that had significant use of industry relative comparisons may have been the most affected. If a developer noticed that simulation results shifted to the worse after a data/algorithm change I would like to know as a subscriber. I would of course drop that R2G. And to that point, I’d like to see R2G developers eat their own cooking. There is no way to know. The best proof of a strategy is that the out-of-sample model results are close to the implement invested strategy.

The intensity of discussion is a reflection both of how much p123’s members care about and rely on p123.

Marco, I really appreciate p123’s continuing efforts to include more data and the tools it offers.

Marc, thank you so much for the detail that allows me to replicate previous results. (I must say I am no longer so confident in those results, but it is important, at least, to be able to get back to square one.)

In general, I’m sure that all members would agree we prefer to avoid the very immediate situation that the recent change created:

This is what happened to me… I made a change to a sim and got a disappointing result. I changed it back and the results were entirely inconsistent with all of my previous work without knowing why. It’s the “without knowing why” that was so upsetting. Similarly, other members have discovered that their live ports, screens and public trading models had changed before they were fully aware of what was going on. A lot of time had been invested in their work, they liked what they had created and in many cases they and other people were using the work to make investment decisions.

I think that Denny Halwes made a very useful suggestion:

“One thing that would be very helpful would be a continuously updated list of changes by date of change, a description of the change, and the rational for the change.”

If a) Mr. Halwes’ suggestion was in place and b) there was more effective advance notice of these kind of changes, then p123 could continue to evolve but members would be able to accommodate the changes more easily.

Is it possible that Mr. Halwes’ suggestion could be implemented? Can more effective notice of meaningful changes be made in the future so that repeatability could be maintained while still improving the site?

Hugh

Chris355, [quote]
I would like to remind you that backtests are not the only ‘quality’ measure perspective RTG customers can use to determine if they want to subscribe or continue to subscribe to a designer’s model(s).
[/quote] In our previous discussion with Steve we agreed to disagree on in-sample data predictability to OOS data. I think there are some predictability and expect at least some consistency between data. Steve thinks in-sample have nothing to do with OOS, some designers do their models the way in-sample not necessarily should be consistent with OOS, in-sample should not be called “performance” (and the same time stating “same sim but calculating the performance”) and better not to show in-sample data to investors at all.

Now, isn’t it strange that Steve need to dig into every simulation details just to explain to investors? These inconsistencies in Steve’s argumentation rises more my questions. My English is really poor, but I never had a problems with logical thinking so I suppose I am missing something and asking Steve to explain. Here I am, Steve’s investor, do it, explain to me why I should care about data or algos change and should not about why model relies on whose so hard to get negative impact.

BTW, I have no idea what model Steve is worrying about, never had investments into Steve models and do not plan to. I like Steve input into community and hereby confirm this. While I am against models hard optimizations I fished out Steve’s old manual on Rank optimizing and would do this again no matter how “friend-ed”, “educated” and “deep-ed into details” I will be during discussions.

Shaun, [quote]
If a developer noticed that simulation results shifted to the worse after a data/algorithm change I would like to know as a subscriber. I would of course drop that R2G.
[/quote] Note, you, as a subscriber, don’t need to know what exactly have changed. It is sufficient to know that changes was positive (bugs in algos or data corrections) and sim shift was to the worse.

[quote]
The best proof of a strategy is that the out-of-sample model results are close to the implement invested strategy.
[/quote] As I understand Steve would strongly disagree.

atw, [quote]
This is what happened to me… I made a change to a sim and got a disappointing result. I changed it back and the results were entirely inconsistent with all of my previous work without knowing why. It’s the “without knowing why” that was so upsetting.
[/quote] What is upsetting me, as investor, why you don’t get upset by “with knowing why”, I mean with your model reaction.

[quote]
Similarly, other members have discovered that their live ports, screens and public trading models had changed before they were fully aware of what was going on.
[/quote] So you think P123 should expect some level of models optimization, that leads positive corrections to negative model shifts and get alerted designer? I am, as investor, would like to see such a “stress tests” monthly.

[quote]
A lot of time had been invested in their work, they liked what they had created and in many cases they and other people were using the work to make investment decisions.
[/quote]Over-optimization is always for likes. But how much value have this “lot of time had been invested in their work”? Isn’t it Marc constantly calling to invest into models design wisely, not “for likes” for a moment? Not sure is it ethical to speak here about investors loss of value (real money) with this work?

There is very little (if none) investor voices here on the forum. While it is terrible by itself the more frightening what designers sometimes is putting here. Do you think investors do not read this forum? And you still expect to attract them? I saw a lot of s$^& on the broader investment community so hardened a bit, I like P123 ideas, I do believe in reasonable sustainable alpha. That is why I call for designers here to think hard about what and how whey are doing on P123 not to cry after about under-subscription of R2Gs, P123 passiveness, interest lack for low alpha models etc.

All -

First off, I want to say that I accepted the change to the industry factors a long time ago after the reason was explained. I don’t like the way in which it was handled, but that is in the past. Its a dead issue other than some follow up with my friend Konstantin.

What has me upset is the change in mindset that is occurring. Three years ago P123 had quite a decent system for quantitative analysis. Now I am witnessing the end of this era. Once one accepts that simulations are not repeatable and that coarse estimation is “good enough” then I have to make the observation that there is no need to back-update the database. For what purpose does changing data in the past do? The changes are not point in time, and apparently we already have sims that are good enough. Updating past data only serves to eradicate repeatability, which I feel is extremely important.

So I have to ask you (readers), what is the next logical step in the path being taken to eradicate quantitative analysis? It has already been accepted that past data is not point in time and can change at any given time. So why not cancel the time-stamping of fundamental data (implemented last year as a result of spillover from the weekend)? Does it bother you that running a screen for last week will give you different stock picks now than what you bought last week? If you believe that not having static data from the past is not an issue then it shouldn’t bother you at all. It doesn’t matter if the data is different last week, or 15 years ago, its the same issue. So cancel time-stamping of data as P123 is not supporting a point in time database.

Why run updates on Monday mornings with an hour to spare before markets open? This IS a platform instability issue. The update sometimes fails. What is the point to running the update with an hour to spare? Change the update time back to Saturday and most people will be a lot happier. What will be the effect? Some data will claim to be available but isn’t. Is this a problem? Not at all because it is no different than data history changing 15 years ago.

Now Marc brought up the issue of companies restating their earnings. I was under the assumption that this is being handled point in time, which would defeat his argument. But is it handled PIT? If it is then why bother? This is the next logical thing to cancel as it doesn’t make sense to force point in time for some issues but not others.

How about problem investigations? Once it is determined that a problem affects a sim by less than 30% or some predetermined figure then why expend the effort (money) to track down the root cause? This may be perceived as money wasted.

Does anyone see where I am going with this? Once you buy into sims being non-repeatable then you start the ball rolling towards turning the database and processes into mush. There comes a point in time where there is no turning back.

As I stated before, if one subscribes to the opinion that discounted cash flow is the one and only one legitimate investment strategy (and yes that is an opinion), then there is no need to simulate results because you already know you are “right”. There is no value added to simulation, and in any case it would be on a database that is not point in time. You would have no ability to quantify the error for a simulation that is not point in time. Thus P123 will not be providing much more than a glorified stock screener, and I’m sure I can find some on the internet that screen for discounted cash flow, possibly for free.

So if you are concerned about the path being taken then now is the time to speak up, tomorrow may be too late.

Steve

Yes it does. I have seen Marco as taking the lead in preserving PIT as much as possible with some data sources being less than perfect. I appreciate what he has been able to do (e.g., the Compustat data by itself seems good). He needs to continue this.

Some do seem to be arguing that Marco’s hard work is not necessary. It is necessary and very much appreciated.

P123 is about testing how an idea really would have performed (no matter where you got the idea) and then being able to continue the best ideas into the future. Maybe add to that but don’t change it.

Improving data methods (while still preserving PIT) is necessary and a different subject. The original discussion about industry averages was about a new–and probably improved–method.

My friend Konstantin -

Here are my answers to your questions:

“WHY do you need to know why the sim results changed? I mean you know WHY in general and BASIC, because there are a lot of thing, out there, behind the window, there the real life with all it’s uncertainty and fallacy exist.”

This has nothing to do with what is behind the window, fraud market revisions, market impact, etc. This is a digital computer processing a point in time (and presumably static) database of stock information. Unless I designed in a random number generator, the simulation results should not change from one day to the next. If something did change then there must have been some external event that caused the change. That event may have been an algorithm change (as in the case of the industry factors), a change in data, or a s/w bug has surfaced. S/W bugs are the most worrisome. If they go undetected then they can lead to loss of capital for subscribers and a poor track record for the model designer. Algorithm changes can generally be accommodated by a model update. Both database and algorithm changes are problematic because they impede investigation of potential s/w bugs, database moreso than algorithm.

“This is why Investor do not interested if your sim repeatable or not. Investor know exactly, this is utopia. From math point of view it is interesting task, PIT data and blah, blah, blah and certainly everything is possible. But what is the reason for such an efforts? Who and why needs your explanations how the reality is changing?”

Attempting to understand what is going on is a matter of professionalism. If I find a problem I will act in my subscribers’ best interest. That doesn’t necessarily mean informing the subscriber, it might simply mean udating the model with the normal revision notes. Any subscriber can always ask questions. But if I found a problem that meant the model has fallen apart or doesn’t produce profits then subscribers must be notified of course.

“why the f$%^ your sim/model/R2G have changed NEGATIVELY so greatly because of SINGLE SMALL POSITIVE correction? And this is the question of model quality (curve fitting/optimization/data mining) and this question is not about simulation and data, it is about YOUR MODEL. Am I wrong, my friend?”

My friend - when it comes to that, subscribers don’t waste energy on foul language. They simply unsubscribe and move on. The point is to find the problem before the subscriber has lost his money.

“I think there are some predictability and expect at least some consistency between data. Steve thinks in-sample have nothing to do with OOS”

This isn’t what I said but may be how you interpreted it. What I said is that only the designer knows how he/she developed and optimized the model and that people scrutinizing the model in-sample data are doing so out of context. Directly comparing in-sample data to OOS or to other models is apples-to-oranges comparison.

"and the same time stating “same sim but calculating the performance”

If said that then I slipped up. Please arrange for a public stoning.

“Now, isn’t it strange that Steve need to dig into every simulation details just to explain to investors?”

I never said I have to explain to investors. What I said was that I need to understand the discrepancies and act in the investors’ best interests.

“Here I am, Steve’s investor, do it, explain to me why I should care about data or algos change and should not about why model relies on whose so hard to get negative impact.”

I don’t really undeerstand what you are trying to say at the end of the sentence.

To understand my concerns, you have to understand my background. I used to work on a NASA project for assisting in the build of the International Space Station. For a period of time I was responsible for a “crit 1” box, meaning that failure results in death. From this project I got into the practice of documenting every incident, whether it be a true failure or something that just seemed unusual. Every discrepancy was investigated until the root cause was found and a satisfactory resolution enacted. Now I realize and am sensitive to the fact that P123 is a small company and doesn’t have the money/resources and doesn’t require this level of safety, but the principles are still important, and there is a lot of money at stake.

As an aside, one of my career highlights was back in the '90s when John Glenn had his second flight into space. This also happened to be the first operational flight for our product, the box was installed in the crew cabin. A week before the flight, an identical box in one of our labs went up in smoke. So I had to make a presentation to the NASA safety panel why it was safe to fly our box. Failure to fly probably would have meant the end of the company I worked for. Thinking back on this episode, I realize that if something had gone wrong on John Glenn’s flight there would be no where for me to hide. I would have been public enemy number one.

As a second aside, I told this story in a job interview a few years back. The only problem was that the interviewer didn’t hear the part about John Glenn’s “second flight”. He spent the rest of the interview trying to figure out how old I must be.

Take care
Steve

Steve, thank you for the answers.

Most of the discussion have moved to another topics here and here , so I will not repeat.

[quote]
That event may have been an algorithm change (as in the case of the industry factors), a change in data, or a s/w bug has surfaced. S/W bugs are the most worrisome. If they go undetected then they can lead to loss of capital for subscribers and a poor track record for the model designer.
[/quote] I do worry about the problems you outlining too. Nevertheless as I argued in the posts linked above there are [quote]
7 R2Gs are free, have on average 17,06% Annualized Combined vs. 18,38% Annualized Launch outperforming by 7.7% with total 583 subscribers
[/quote] no matter how much PIT mess is in the data and how the calculation algos have changed! If I would be a designer it seem obvious that the REAL problem is on the another end, in the model. Because free models, that naturally have no optimization bias, performed as designed within the same environment. Why so much attention to the data, algos and repeatably in connection to models performance worsening?

[quote]
“Here I am, Steve’s investor, do it, explain to me why I should care about data or algos change and should not about why model relies on whose so hard to get negative impact.”

I don’t really undeerstand what you are trying to say at the end of the sentence.
[/quote] Above is an explanation.

Steve & Konstantin,

PLEASE don’t start arguing in another thread! Now when I see either one of your posts, if you mention the other’s post I just skip it and look for someone that may have some new info or ideas.

Denny,

Now that you’ve made your logic (ranking system) and forum preference (backtesting) public, the market participants, will be incentivized to change their future (Out of Sample) posts. You’ve succumbed to the classic case of over-samlping forum posts (curve fitting). Your attempts to avoid these posts in the future will likley be unprofitable.

:wink: