I think Marco’s statement about out-of-sample performance not being gospel may have been misconstrued by some folks.
Yes, of course out-of-sample/live/real/money performance is critical. Absolutely critical! But here’s the catch: However you or anyone else feels about your o-o-s performance, tomorrow is always another day.
We’ve seen in the stock market countless occasions when strategies we know to be excellent have endured periods of dismal performance, and vice versa, which absolutely idiotic strategies worked for a while and make their proponents look like geniuses. The late 1990s was a perfect example of the latter.
So suppose you have model, you’re satisfied, and you put it out into the world via R2G. You did everything right. You know you have a great strategy. But yikes, your post-launch performance goes into the tank. If you believe o-o-s is gospel, you have no choice but to accept the label of “loser.” And maybe you would be a dud in such a case; i.e., maybe you’re luck ran out just like with the late-1990s bozos. Or maybe you’re still for real. Maybe you launched at time when your particular approach simply didn’t mesh with what the Street was doing at that point in time. It happens to everyone.
We can’t know one way or the other simply by looking at o-o-s performance. So in fact, o-o-s performance cannot serve as gospel, notwithstanding how important it is on a dollars-and-cents basis. We need to consider o-o-s performance combined with an understanding of WHY the o-o-s performance was what it was. Statistical analysis cannot give you a full answer. But there are things we can do to revise the presentation in a way that will prompt designers to explore the issues and seek answers. That’s what the changes Marco is talking about are designed to do; to help you get a better sense of a model’s vulnerability to things that really are not relevant to the strategy and should not be influencing it.
The Ind-factors controversy is a case in point. If you go back to Marco’s original post explaining the reasons for it. Given the irrational skittishness of the old series, if the revision hurt your model, then that means there’s a problem with the model. And if it’s a model that has performed out of sample, then you should be thankful (i) that you didn’t get burned even though you lived dangerously, like the drunk driver who managed to get home without getting into an accident, and (ii) that you get to see the problem and get a chance to fix if before your luck runs out.
Somebody in this thread mentioned how designers depend so heavily on o-o-s for their reputations. Yes, that’s right. That’s what subscribers and prospective subscribers see. And because it’s so important to you, we want to HELP you improve the probability that good o-o-s performance, which is your lifeblood, will persist. But you can’t benefit from that if you fall in love with your o-o-s performance, just like you can’t fall in love with simulated performance.