FACT: Simulations results are no different than out of sample results

pvdb · May 21, 2015, 7:56am

Konstantin, please tone it down. The P123 forums are generally very pleasant to read, but your desire to argue endlessly to try and win an argument is making it less pleasant. Some people have other points of view. Who cares. Just focus on your investing performance. That’s the kind of winning that counts.

And I think this extends to both camps that argue in favor or against data mining. Different people have different opinions. You may or may not agree. So what. Just keep making money in whichever way you feel comfortable with and be happy.

whotookmynickname · May 21, 2015, 9:23am

Yes. There are many ways to win the stock market game. All thoughts and opinions are free and everybody should try and design how they wish. Anyone can data mine or use solid fundamental understandable factors. The beauty in stock market investing is that it is perfectly measurable, although of course luck plays an immense role.

I strongly agree and advise P123 to focus more on actual live performance. This is it. Track record together with the CV/skills/expertise that a designer/manager can offer is what is judged by mutual fund or hedge fund investor, nothing else. In the real world, professionals don’t buy funds just because of simulated performance. They buy into funds because they see their live track record, read, like and understand what the fund manager is saying about his investment philosophy and investment processes, and the pedigree/reputation of a manager.

Besides relying on OoS track records, P123 could introduce crowd-voting and peer-review features. This could for example allow paying members who are active for more than 12 month to upvote/like/peer-review certain designers, which in turns would help P123 newbies or non-paying R2G-only subscriber to identify models based on track record and P123 designer expertise/peer-reviews.

Voting features could include:

Length of membership
Expertise
Peer-Reviews & Testimonials
May-be even if a designer is trading his own system with live money via TRADE

Jrinne · May 21, 2015, 10:16am

This is absolutely true for large data sets at P123. If I want to run a paper port using fundamentals for a year, I can. I can also wait a year and run the sim. The results will be the same.

I find this to be incredible. Big data available to me!

I can’t imagine that this is easy and I marvel at what Marco has been able to do.

But we want as much information as we can get. If there were a source of data that increased your returns but was not perfectly PIT, who would vote against it. Not me!! Especially if I understood the data’s uses and limitations. Also, maybe there is no other realistic source for that data. I would vote: lets purchase it!!! Note: this would not stop me from sticking to the pure PIT data if my obsessive compulsive tendencies drove me in this direction. I could even run a sim with and without 100% PIT data.

We live in the real world. There will always be a tug of war between the desire to have more data and the desire for perfect data. The day this stops is the day there won’t be any market inefficiencies anyway.

I like to understand the data but I’m going to leave most (okay all–since I know nothing about programming) of the programming, snapshots etc up to Marco. He has done a wonderful job and my other options are a distant second.

o806 · May 22, 2015, 6:04am

I strongly agree with Denny that noise should not be added to private sims. An “option” to add noise would be ok, but it should not be the default and certainly not mandatory.

Brian

tobiasberr · May 22, 2015, 6:35am

Excellent approach Marco.
That is the right way forward.

In this post from Marco in the year 2013 http://www.portfolio123.com/mvnforum/viewthread_thread,6859

Marco shows you how to use the Random function in a buy rule to generate a kind of Monte Carlo Simulation.

Use the Optimizer to generate 20 or more Permutations each with the same Buy Rule Random<0.9 and see how the results vary.
The less they vary the more robust your sim should be.

For more noise use Random<0.5, then P123 will flipp a coin before buying each stock.

plan_trader · May 22, 2015, 9:36am

“In addition, P123 needs to understand who their customers are. Many of my subscribers have many ports and subscriptions to models. Going to the effort of creating a “system” at the port level doesn’t make much sense. More effort needs to be put into the book level, and perhaps that is where the noise and probability of outcomes needs to be performed.”

I agree with Steve. Perhaps the noise feature could be an option, but to make it a default would be extremely annoying to me. And how would it work when combining portfolios into a book? The Book feature could be really enhanced and become even way more powerful - the holy grail of system trading is to combine strategies.

mgerstein · May 22, 2015, 8:13pm

I think Marco’s statement about out-of-sample performance not being gospel may have been misconstrued by some folks.

Yes, of course out-of-sample/live/real/money performance is critical. Absolutely critical! But here’s the catch: However you or anyone else feels about your o-o-s performance, tomorrow is always another day.

We’ve seen in the stock market countless occasions when strategies we know to be excellent have endured periods of dismal performance, and vice versa, which absolutely idiotic strategies worked for a while and make their proponents look like geniuses. The late 1990s was a perfect example of the latter.

So suppose you have model, you’re satisfied, and you put it out into the world via R2G. You did everything right. You know you have a great strategy. But yikes, your post-launch performance goes into the tank. If you believe o-o-s is gospel, you have no choice but to accept the label of “loser.” And maybe you would be a dud in such a case; i.e., maybe you’re luck ran out just like with the late-1990s bozos. Or maybe you’re still for real. Maybe you launched at time when your particular approach simply didn’t mesh with what the Street was doing at that point in time. It happens to everyone.

We can’t know one way or the other simply by looking at o-o-s performance. So in fact, o-o-s performance cannot serve as gospel, notwithstanding how important it is on a dollars-and-cents basis. We need to consider o-o-s performance combined with an understanding of WHY the o-o-s performance was what it was. Statistical analysis cannot give you a full answer. But there are things we can do to revise the presentation in a way that will prompt designers to explore the issues and seek answers. That’s what the changes Marco is talking about are designed to do; to help you get a better sense of a model’s vulnerability to things that really are not relevant to the strategy and should not be influencing it.

The Ind-factors controversy is a case in point. If you go back to Marco’s original post explaining the reasons for it. Given the irrational skittishness of the old series, if the revision hurt your model, then that means there’s a problem with the model. And if it’s a model that has performed out of sample, then you should be thankful (i) that you didn’t get burned even though you lived dangerously, like the drunk driver who managed to get home without getting into an accident, and (ii) that you get to see the problem and get a chance to fix if before your luck runs out.

Somebody in this thread mentioned how designers depend so heavily on o-o-s for their reputations. Yes, that’s right. That’s what subscribers and prospective subscribers see. And because it’s so important to you, we want to HELP you improve the probability that good o-o-s performance, which is your lifeblood, will persist. But you can’t benefit from that if you fall in love with your o-o-s performance, just like you can’t fall in love with simulated performance.

Jrinne · May 22, 2015, 8:38pm

Marc,

I agree as long as the noise in the sims does not obscure the fact that a port can crash because a sim is not using PIT data. If the noise makes us unaware of data imperfections people will not know how to correct their sims.

Fact: Simulations results ARE different than out of sample results.

I understand there is more than one reason for doing a thing. Marc thank you for explaining the other reasons. I just hope the treatment is not worse than the problem. I think the treatment will compound the problem here. See my previous posts where I state my preference for keeping our present data without needing to add noise to obscure the imperfections.

I get some of the subtle arguments about the earnings estimates data being out there but just not available for the port. PIT? Maybe but: Facts are facts (see above fact).

BTW, I like being here and if I am bothering anyone I’ll just run my port. Just email me.

InspectorSector · May 23, 2015, 1:40am

I’m going to chyme in here. I know most are probably sick of reading my posts, I’m sick of writing them for sure. I feel like I am bashing my head against a wall over and over again.

So I’m coming out of the (optimization) closet. I’m confessing to being an optimizer. What I do is optimize and optimize again, I push my sims to the limit with optimization. I do this with no market timing now as the tools we have here are amazing and getting better all the time. I feel it is important to eliminate market timing if possible.

I have been learning a lot over the last two years and I’m amazed at some of the out of sample results I am getting. The optimization strategy is to find alpha based on relationships you may not be aware of.

Now I know that the statements above are going to draw a lot of criticism. Some are going to tell me that this is not “proper” backtesting. Regardless of what anyone thinks, it is a legitimate technique. Neural nets work on the same principle (albeit more sophisticated).

The point is that my technique will never achieve the same results out of sample as in back-test. To paraphrase from one of Larry Williams’ books “expect 50% less average annual return and 50% higher drawdown than the worst drawdown you saw in your backtests”. OOS will never be as good as backtest promises with the optimization technique. Now I’m sure that value-based stock pickers may be able to achieve OOS results in the same ballpark as back-test, provided they can find stock picks that meet their criteria. The value-based strategy is being labelled as “proper” backtesting. (These are not my words). By implication, optimization is not proper. So the question is whether optimization is a bad thing or not. In my opinion, if your target is mediocrity, you may very likely achieve your target. My observation has been that optimization, done in the right fashion, will greatly improve results. This does not mean there is an intent to deceive R2G subscribers. R2G subscribers are deceived when backtest is presented as performance. Lets not fool ourselves. Backtest results are not the same thing as OOS performance. Backtest is not performance.

Now I think the whole concept of rolling back-tests are intriguing and will be very useful for port (model) development, although I think P123 members would get more benefit out of it if it applied to books. But I don’t feel it is in our best interests to have this feature for R2G. The reason is that the majority of models will never achieve the results seen in backtest. I am not speaking for just my models, most models are optimized. The models that will benefit by OOS performance compared to backtest are those not optimized - absolute buy rules (stock picking) with limited or no use of ranking system. This is also a legitimate investment strategy and the likes of Marc and SZ have shown how good this strategy can be.

But lets be fair and not slant R2G towards one specific style of investing.

As for someone thinking my statements are inconsistent… I like rolling backtest for development purposes (in-sample). I don’t like rolling backtest for R2G. I also like training wheels for kids learning to ride a bicycle. I don’t want training wheels when I ride a bicycle.

Steve

Jrinne · May 23, 2015, 11:25am

Steve,

I’m afraid you are not getting it. You don’t need just training wheels. I’m thinking maybe a training camp. We can HELP you.

I’m kidding. On a serious note. I just looked at Steve’s R2Gs. I can’t understand why he would think he has anything to explain regarding his techniques. Maybe we should not believe our lying eyes?

Steve says he can be rude at times. I’m going to take his word on this. I wish this discussion had not started on the trivial issue of industry averages: I thank Marco for improving that.

But why does Steve feel he has anything to explain regarding his techniques?

Tomyani · May 23, 2015, 1:57pm

Steve,

I’ll pat you on the back, too. You’ve contributed a lot to the community the open transparency about your development methods. You, and most nearly all modern professional quant traders use optimization and data mining. Data mining also gave us, among other things:
a) Modern ‘cancer treatments’ that have never existed before and defy ‘common sense’ (such as looking at adjacent healthy cells and not just the cancer cell) in terms of forming our predictions,
b) The idea that cholera spreads through the water (thanks John Snow) and not through Miasma and ‘bad air’.
c) Countless medicines. Predictive analytics is now being used to find new uses for existing medicines. See: http://med.stanford.edu/news/all-news/2011/08/stanfordpackard-scientists-find-new-uses-for-existing-drugs-by-mining-gene-activity-data-banks.html A quote from this: "When the scientists applied an “opposites attract” algorithm to publicly available databases, surprising sparks flew: They found potential compatibilities between numerous existing drugs and diseases for which those drugs had never before been thought to be beneficial.

So far, preclinical tests have borne out at least two of these findings: Cimetidine — a widely used, cheap, over-the-counter anti-ulcer drug — may be a good fit for a form of lung cancer; and topiramate — an off-patent anti-seizure drug with a solid safety profile — may be therapeutic for inflammatory bowel disease."

d) Sales and advertising start-ups. See:
http://techcrunch.com/2015/03/18/insidesales/

I understand why you feel ‘defensive’.

THERE IS NOTHING COMMON SENSE ABOUT THESE RELATIONSHIPS! BUT, IF FOUND WITH CORRECT STATISTICAL PROCEDURES, THEY WORK AND ARE REVOLUTIONARY ACROSS NEARLY EVERY DOMAIN OF LIFE / BUSINESS. THIS IS WHY BILLIONS ARE FLOWING INTO BIG DATA / PREDICTIVE ANALYTICS. CONTINUING TO PUSH FOR P123 COMMUNITY AND PLATFORM TO GIVE US MORE TOOLS TO DO THIS BETTER IS GOOD. PUSHING ON DEEPER UNDERSTANDING OF FUNDAMENTAL VALUATION IS ALSO GOOD. BUT, WE DON’T NEED TO BEAT DOWN OTHER PEOPLE TO TRY AND GET EXCITEMENT ABOUT WHAT WE LOVE - PROFITABLE TRADING SYSTEMS THAT ARE RELATIVELY UNDERFOLLOWED. THE REALITY IS THAT MACHINES ARE NOW MUCH BETTER THEN PEOPLE IN MANY ‘THIN SLICE’ APPLICATIONS AT SEARCHING LARGE DATA SETS AND FINDING PATTERNS. AND THE GAP IS WIDENING. P123 IS A QUANT PLATFORM FOR LITTLE GUYS AND IS ALSO A TOOL FOR AUTOMATING TRADITIONAL FUNDAMENTAL ANALYSIS. P123 HAS COMPETITORS (QUANTOPIAN) THAT GET THIS. THE MORE P123 GETS ON BOARD WITH THIS AND CONTINUES TO UPGRADE THE TOOL KITS WE HAVE TO DO THIS WELL (SUCH AS ADDING THE TOOLS FOR RANDOMIZING RETURN STREAMS AND CREATING OUTCOME ENVELOPES), THE BETTER OUR RESULTS ARE LIKELY TO BE, AND THE MORE ATTRACTIVE P123 WILL STAY, AS WELL AS R2G WILL GET. IT DOES NOT SERVE P123 WELL TO BASH THIS EMERGING FIELD OR TO CALL MATH ‘GOBBLEDY-GOOK.’

To say that ‘machine learning’ or backward ‘data mining’ is bad is decades out of touch with modern computing and predictive analytics. These companies are attracting billions in venture capital money. I really like P123 and have found it the most powerful learning tool around for learning financial markets.

I would also add that R2G is a separate business. Part of the issues the community is having is that P123 is trying to launch one ‘retail focused’ business (R2G) and one set of advanced developer tools and trader focused business and keep it all on the same platform. That makes things much more difficult. Two separate websites would likely work more simply and easily and yield better long-term results - they can both point to each other. I get that you likely want R2G traders to upgrade to paying P123 sub’s, but I believe you will create more long-term business value with 2 distinct and separate businesses. The two businesses also still likely compete in some ways as well, the more R2G’s sold in low liquidity spaces, the worse the fills traditional P123 traders will get, etc. But they will be able to have much more focused interfaces and marketing and branding - and can likely share 90% of the backend programming.

Best,
Tom

Jrinne · May 23, 2015, 2:00pm

Tom,

Thank you so much!

mgerstein · May 23, 2015, 2:22pm

That’s the point of Marco’s initial post and my elaboration on it. EVERYBODY should ALWAYS feel obliged to explain themselves because no matter what’s been accomplished up until now, there’s always tomorrow and there is always the possibility that tomorrow will bring less satisfying results. And when less satisfying results do come, the explanation is the only thing upon which one can lean in order codetermine whether it’s just one of those things where a sound strategy hits a cold spell or whether the strategy was more lucky than sound.

I explain myself. Warren Buffett explains himself. Peter Lynch explains himself. Ben Graham explained himself. William O’Neil explains himself. William O’shaughnessy, despite the worst-titled book in the history of financial publishing (it implies that data mining is good, but in the book, he makes it clear that he’s opposed to the practice) explains himself. Asset managers explain themselves (their web sites typically include links that lead to discussions of their philosophy). Etc. etc. etc. To repeat, EVERYBODY should ALWAYS feel obliged to explain themselves.

So the fact that Steve does explain himself is proper and should be imitated by others. I disagree with how he approaches things, as I’ve expound in other posts I’ve written on the topic of optimization, but I definitely support his choice to explain himself.

Tomyani · May 23, 2015, 2:35pm

Here is Quantopian’s fund-raising to date:
https://www.crunchbase.com/organization/quantopian

P123 is doing some really good things.

But Quantopian is potentially a huge threat to P123 and their business, and to we system builders who will also all be competing with all these traders. The more P123 can keep upgrading a) more data (user provided? with licensing?), b) more tools for letting more math or programming challenged users compete and design more robust distinguish true alpha from noise, etc, c) the book tool evolutions talked about in threads and d) the daily sim’s for those who want them (that’s not me, but it could be if I could autotrade them) and e) continuing pushes on the autotrade functionality. This is potentially a huge, sustainable competitive advantage. But, we should all know that these ‘quants’ are coming for our alpha.

Marc’s push on true bottom up analysis is another place P123 differentiates itself. His decades of experience here is great. But, what can continue to push P123 above is a) the community, b) the ability to do a lot of things well on the site and c) the ease of the user interface (it’s still great here overall).

Jrinne · May 23, 2015, 2:48pm

Marc,

Explain away.

Steve owes me nothing yet he is the most thorough explainer on this forum bar none.

Okay Denny too and Marco but is it that Steve doesn’t explain or someone doesn’t want to hear his explanations?

WalterW · May 23, 2015, 3:43pm

Hi Tom, Interesting that you should mention Quantopian. Recently I had a chance to speak to one of their executives. They are aware of P123 but they don’t consider P123 as a competitor. As a crowd-sourced hedge fund, their business model is a bit different. However, some of their resources should be compelling to model developers. I’m not sure it’s available yet, but they will support an IPython interface to their database. That should make coding and documenting models easier. It will be interesting to see if there’s any cross-pollination with model developers developing on one platform and deploying on the other.

Walter

Jrinne · May 23, 2015, 4:15pm

So, I think P123 is the best. I think it will remain the best. As far as R2Gs do what ever you want. I’m just going to recommend that I not have to explain my private systems to anyone and that throttling back the sims to save me from myself will serve no one in the long run.

I am so confident in this I have not even checked out a 2 week trial of Bloodhoundsystem (Edt: no “s” at the end of this) yet. Bloodhounsystem is truly inferior for my needs (for the same price). I have every confidence it will remain that way: after all who needs daily ranks for backtesting (back to 1988) anyway? This is the service that O’Shaughnessy liked best in his book Millenial Money but he thought it was too expensive for many. I think O’Shaughnessy just does not know the good work that Marco is doing.

One of the reasons Bloodhoundsystem is inferior is because it does not offer earnings estimate data (other than ValueLine). Ironic considering that is much of what this discussion is about.

So again, Marco your hard work is much appreciated and please do not take any detours in your progressive improvements. And please, don’t start heading in the wrong direction.

davidbv · May 23, 2015, 5:33pm

Concerning random testing to see what works, I read once that Thomas Edison tested over a thousand different materials for the light bulb filament before they settled on tungsten. He personally was not able at the time to prove why it worked but he just knew experimentally at the time that it did. He did not even start with a hypothesis, he just started testing stuff (including feathers). I personally don’t feel comfortable doing things that way but…

quantguru · May 23, 2015, 6:00pm

David,
thank you for that refreshing remark, that is indeed an interesting note that reminded me of a quote in ‘More money than god’, a fantastic and inspiring book I read some years back.

There, among others, the early years of the hedge fund Renaissance were described. There were trading signals they identified as market anomalies and which did not make any sense from an economical/business/accounting standpoint. Some could now say this is data mining at it’s best, but one could also argue that some interrelations are simply not discovered nor understood, which does not necessarily mean these interrelations don’t make sense.

I believe that there are still quite a lot of undiscovered anomalies out there. And P123 gives us a lot of tools to uncover some of them.

Link to the book chapter I referred to above (I do not want to quote anything to avoid any copyright violations…):
https://books.google.de/books?id=JBDv-65QzqcC&pg=PA302&dq=more+money+than+god+"otherwise+someone"&hl=de&sa=X&ei=ccBgVc6aEMH_UIepgEA&ved=0CCEQ6AEwAA#v=onepage&q=more%20money%20than%20god%20"otherwise%20someone"&f=false

Miro · May 23, 2015, 7:21pm

I don’t know about gospel, but one of the best ways we have to determine if a design is overly curve fit is to test it out of sample.

Poorly designed systems usually don’t take long to fall apart.

That doesn’t mean that systems with good out of sample results are going to make money in live trading. Or that live trading models that perform well are going to keep performing well.

Past performance (in sample, out of sample or live) does not predict any specific outcome. There are no guarantees. I thought everyone knew that.

The whole purpose of this website, I thought, was to analyze data in such a way as to put the odds in your favor. No guarantees does not mean random walk. There are market inefficiencies. Good, well-designed systems put the odds in your favor to exploit them.

It just so happens that good, well-designed systems also typically perform well out of sample.