Interested to hear people’s thoughts on this post. He picks 50 stocks randomly from S&P 500 and shows it to outperform the S&P.
I have not been able to replicate this in P123…
http://www.followingthetrend.com/2016/04/you-cant-beat-all-the-chimps/
Interested to hear people’s thoughts on this post. He picks 50 stocks randomly from S&P 500 and shows it to outperform the S&P.
I have not been able to replicate this in P123…
http://www.followingthetrend.com/2016/04/you-cant-beat-all-the-chimps/
There is another article that Andreas wrote that is very similar:
The basic premise of both articles is that using market cap to weight a portfolio will result in under performance. I have never tried to replicate, but Andreas has provided code for Quantopian.
Simon
Here is a screen to pick 50 random stocks from the S&P 500.
I ran it a handful of times and am getting annual returns in the range between 9% and 13%.
To make this work, I needed to set slippage at 0% which is not realistic, but it makes the point.
I created a sim for this:
https://www.portfolio123.com/port_summary.jsp?portid=1477974
It has a negative active return for re-balance frequency more often than 12 weeks (3 mo). Positive starting 3 mo.
What was interesting is if you run a rolling test on this 3 month version starting 1/1/2010 and use a 2 year period with 1 month increments, it shows excess returns against SPY a little less than half the time. To me, that makes sense. it is like a coin toss.
Nice experiment! Since the index is cap-weighted, and the screener is equal-weighted, it shows the bias for smaller caps to outperform.
Running this screen, I can never get a backtest that underperforms S&P equal weight. Why is that? My intuition would lead me to believe 50% of simulations should underperform.
[quote]
Running this screen, I can never get a backtest that underperforms S&P equal weight. Why is that? My intuition would lead me to believe 50% of simulations should underperform.
[/quote]Good question.
RSP rebalances every three months, charges 0.40%, and its slippage is probably closer to 1% than it is to 0%. BTW, slippage of RSP is increasing as RSP collects more assets.
All,
Not everyone likes statistics. Truth is the simulations are often so good that the statistics are not necessary. Simulations are a valid statistical method which you can get at P123 without doing more. There are other good reasons not to like statistics or think statistics is needed.
Personally, I have found things using Excel—not limited to slippage considerations—that have increased my returns very significantly. Putting that in bold does not begin to put proper emphasis on this—more than doubling even tripling the returns of some “simple” ports at times. These things were statistically significant out-of-sample–when implemented–and they are still working. The forum has been valuable to me so far and I can prove it. Not with statistics but with simple accounting methods: I appreciate it.
But getting to my point I will try to keep my posts on statistics to a minimum and not debate anyone that doesn’t have an interest in the subject. When I do debate it will be with the goal of one of us (the one who may be making an error or could improve on a technique) getting a little better at the technique. And I will make an effort to give up early when someone is not getting it: This will usually be me. I have learned a lot by posting and having my errors pointed out by the community or finding them myself when I am forced to write things down.
So you say: “yea short.” But the actual statistical point is that you can easily do something like chimp vs. momentum at P123. This is, I think, a good method. This is veeeery similar to the more difficult to implement techniques (and some require a web purchase) in “Evidence-Based Technical Analysis” by David R. Aronson. Clenow says it all and David has implemented it. If anyone has questions on my implementation I would be happy to share.
Thank you for your tolerance of what may be an unneeded and boring subject for many.
-Jim
That was the first thing I looked for when I opened the article.
So thew study is completely misleading. There is nothing random about it. Without understanding what the #&^@ he was doing, the author set out to see if large caps underperformed the cap weighted SP 500 during that specific time period and “discovered” what everybody else already knew.
It’s unfortunate that so much time and energy is wasted discussing “statistical analysis.” We should, instead, be discussing “capable statistical analysis” and “incompetent statistical analysis.” The post is an example of the latter: if the inquiry is not correctly defined, there is no amount of analysis that can rescue a study. He should have used random stock selection and random weightings. And an argument can also be made for random rebalancing intervals (holding-period judgments matter too.)
Great ideas! Have you put in a feature requests?
Many of these problems are mitigated by setting slippage to zero. This, of course, causes problems but not such a bad way to go if you are just trying to find out whether you are doing better than holding a basket of non-cap-weighted stocks with relatively low turnover.
Also, if you do better than random selection without slippage you may really be onto something: Reduces Type I errors.
-Jim
I’m still confusing as to why Chipper’s random screen never underperforms.
David’s sim results make more sense to me.
[quote]
I’m still confusing as to why Chipper’s random screen never underperforms.
[/quote]This new screen holds all S&P 500 stocks in equal weight and is more similar to the equal weight index. Depending on the starting date it makes about 0.40% to 0.78% a year more than the ETF.
The ETF charges 0.40% a year which explains most of the difference in returns.
The rest of the difference in returns might have to do with the facts that the ETF loses money to slippage when rebalancing.
If you set the screen for all 500 stocks and the start date to 04/24/2003 and end date to 02/28/2017, you can compare directly to Guggenheim’s published return since inception (today’s posting).
Annualized returns:
Guggenheim RTF as reported:. 11.24
RTF as screen benchmark: 11.23
500 stocks screen: 11.94
The 0.7% annualized difference then should be (?) due to the 0.4% gross expense ratio plus slippage and outside expenses including trading costs when they rebalance. In fact, you have to set the screen’s slippage to 3.9% to get 11.24% annualized return! Hidden costs indeed!
Edit:. The above figures used Next Open pricing. Using Next Close, the required slippage to match Guggenheim’s reported return since inception is 3.4%.
No matter how many times I run this screen, it never underperforms the S&P equal weight (so not a cap weighting difference) benchmark (not the ETF, so no expenses or rebalance slippage).
I would expect that half of the backtests would underperform. Does anyone know why it doesn’t?
I removed Chipper’s original buy rule of “FOrder(“Random”) <= 50” and added a Quick Rank of “Random”. Running repeated backtests, 2 of them have now underperformed (most of the backtests still seem to outperform).
I think S&P equal weight has no dividends.
Yes, S&P 500 Equal Weight does not include dividends. The Guggenheim benchmark does include dividends from 4/24/2003, it’s inception date, but not before that.
Okay, this makes sense. And tells a much different story than the chimp article.
Maybe some of you have done this.
I took a port that I have used since 8/13. I changed the slippage to zero and made the ranking system Random. I ran this sim 100 times (starting 8/13). My real port beat the random sim 93 times.
Pretty good huh?
Maybe, but not statistically significant. To be significant (p <= 0.05) it should have beat the random sim 95 times or more.
But it gets worse. If I had 10 ports, say, I should consider using the Bonferroni correction. In other words I should demand a p-value of 0.05/10 or 0.005. In other words, if a random selection of stocks run 100 times beat my port even once it is not significant.
Note, I did this without using the words mean, standard deviation, standard error, degrees of freedom or t-statistic even once. And there are advantages one does not even consider immediately. Example: Shouldn’t I use log returns? Taken care of by P123 which uses annualized (geometric) returns. Shouldn’t I compare to a benchmark? Answer: the random returns of your universe serve as you benchmark.
What about fat tails? Are the tails fat or does the central limit theorem take care of that? No worries. If tails are fat then your distribution of random sims will have fat tails. Let someone else—at a University or something—worry about that one. It works either way.
This is powerful stuff. And if you want to get the right answer this is the way to do it. I suspect Clenow uses this technique on a regular basis. However, what he put into the article may be somewhat simplified or made humorous for the reader.
Too bad though: I really thought that port had already proven itself out-of-sample.
-Jim
I have a question, Jim. By changing your ranking system to random, aren’t you discarding that aspect of your port and only testing the universe plus buy and sell rules for significance? If so, then would a test using the original ranking system but a random buy rule allow you to test the significance of the ranking system?
Separately, I tend to think a run of 100 tests could easily fail significance when dealing with a population of thousands of stocks and a small sample.