P123 ranking systems underperform the market?

Monday - sorry for flogging this horse but perhaps we can look at this from another perspective. Attached is 5-year RS historical performance of the Greenblatt RS tested on the S&P 500. As noted before, the Greenblatt was released prior to the most recent 5 year period.

Note the setup that I have highlighted in red as this is important. N/As are set to neutral, which minimizes the impact on our interpretation of results. I set to 5 buckets so that we are focused more on signal than noise.

The Greenblatt RS gave a 3.5% advantage from top ranking stocks versus the lowest ranking stocks.

My conclusion is that the P123 Greenblatt RS “works”. It is up to you to take it from here. I can lead a horse to water but I can’t make it drink! Proper application of the RS will give you a trading advantage. It does take patience and the result won’t always outperform on a short-term basis.

Steve


Good point Steve. My belief is that the ‘top 5’ or ‘top 10’ ranked stocks mentality is dangerous. This is a game of statistics where you might gain a slight edge flipping a coin 1,000 times. These advantages can add up. But trying to make a system where you flip a coin 5 or 10 times and think you can be right 80% of the time is just fooling yourself.

Kurtis,

Obviously this is completely true. But it is never easy.

More flips is always better—unless the payout for each additional flip is less. Then, hmmm… It depends.

With a larger number of stocks—the additional stocks CAN very quickly begin to perform at the level of the benchmark (long term) but be more volatile than the benchmark.

Different systems behave differently. But if you run your system with buy rule RankPos > 20 and RankPos < 25 (adjusting the number of positions etc) you could find those additional stocks are not adding much. Or maybe they are.

In one of my models stocks with rank position 6-10 were adding more risk than benefit. I went back to a 5 stock model and found other ways to manage risk. EG, buy some of the benchmark, keep money in CAH or bonds, add another model to the book etc.

I would argue this is more often the case than one would expect because of the bias induced by the ranking system on models with more than 5 stocks.

-Jim

Hi Jim,

I’m not completely following your last point.

"In one of my models stocks with rank position 6-10 were adding more risk than benefit. I went back to a 5 stock model and found other ways to manage risk. EG, buy some of the benchmark, keep money in stocks or bonds, add another model to the book etc. " - to check this wouldn’t you just run the model with 5 vs 10 positions? And if so then how does the line below relate to this?

"Different systems behave differently. But if you run your system with buy rule RankPos > 20 and RankPos < 25 (adjusting the number of positions etc) you could find those additional stocks are not adding much. Or maybe they are. "

Thank you

Scott

Regarding the last point in your quote:

Say you think 25 stock portfolios are better than 20 stock portfolios. You might be right but you should be sure that those additional (5) stocks are adding something.

I was outlining, in an incomplete way, a method of isolating those 5 additional stocks in a sim to see how they are performing.

How do those last 5 stocks perform? Are the returns good when they are kept in the portfolio? Are they volatile?

Not always but there can be a strong drop-off. Those stocks can be isolated in different ways to determine whether they add returns, are volatile etc. Probably in better ways than I described or have even thought of.

Is that even the part you had a question/possible disagreement about? Sorry, if that does not help.

Remember those additional stocks are HIGHLY CORRELATED to the other stocks in the port. POSSIBLY THE MOST CORRELATED STOCKS YOU COULD FIND WITH EFFORT. Unless they are performing well it is hard to imagine how they could be the best use of your money for the purpose of risk reduction.

-Jim

Hi Jim,

Thanks for the explanation. I’m following you now

Scott

Let’s play the other side. Buying the index would give you a negative alpha of 1%. To beat the index you’d have to accurately pick the top 20% of the stocks and then lose nothing on slippage.

Why Percentile NAs Neutral? Why not the default? Please expand

Jim,

You bring up good points on how many stocks are best for the actual portfolio. Sorry that I wasn’t clear.

My point is more about testing your ranking system across an entire universe before you get to the actual portfolio construction stage. Use the ranking buckets performance test to see if your system only works on 20 stocks or has informative for all 8,000. If the correlation between rank and return is meaningless on 7,980 stocks but then you get this awesome performance spike with just 20 stocks - I would highly suspect that something is off…unless you found an anomaly that was only present in 20 stocks at any given time (e.g. 1000% earnings surprise with 90% short interest).

Maybe it is just me - but I like to see each bucket improve in return when testing the ranking system. The actual number of stocks you invest in is another topic. If you had 10 systems, you buy the top 5 of each and this might be the best way to go. Just be sure those 10 systems weren’t over-fit to just the top 5 stocks.

Not sure what you mean. Buying the index gives you zero alpha by definition. In any case, I should have used RSP equal-weight ETF as the benchmark, not the S&P 500. I’m not sure how that changes the result but in any case, it is not relevant to my point. The point is that there is a 3.5% spread between the top and bottom bucket which is pretty good for S&P 500 stocks. I used weekly rebalance but that does not imply high slippage by any means. Slippage and turnover are really dependent on the port design.

Also, consider that this is carte blanche ranking performance across the S&P 500 stock universe. Its application is not necessarily appropriate for all industries, and restricting the test to specific industries can improve results. But that is a topic for another day and for those interested in chasing this further.

Steve

** EDIT **
Out of curiosity, I ran with RSP benchmark and you can see that there is more difference between the top bucket and the benchmark now.RSP which is equal-weighted is the correct benchmark to use for an apples-to-apples comparison. And before anyone asks, yes it underperformed the SP500 index over the last 5 years, but over a much longer period of time, I would expect RSP to outperform the SP500 index.

I also have thrown in a Healthcare Equipment R3000 (high liquidity) universe Greenblatt RS performance chart for smiles.


Vineeta - this is an important point for when you do this type of test. There are a number of N/As produced by the ranking system and these N/As will distort the historical RS performance chart. You can’t overcome the fact that the chart will give distorted results, but you can control where the distortion occurs in the chart. If the N/As are ranked lowest which is the default then your lowest bucket (or buckets) will be full of stocks that couldn’t be processed properly. This would affect my top/bottom spread calculation. By assigning N/As a neutral rank, they show up in the middle bucket(s). Then I can be reasonably comfortable in assuming that the lowest bucket and highest bucket are giving me good performance results.

** EDIT **
I should mention that setting N/As to neutral was done only for better analysis of the RS performance chart. I generally do not set ranking system N/As to neutral for model deployment, unless a long/short model (or screen) is required.

Hope that helps
Steve

Getting back to the point of this thread–ranking systems–I’ve designed one that I hope will work for large caps in the future, though of course I have no assurance that it will. If you do a performance test over the last ten years, rebalancing monthly, the top decile of the S&P 500 gets a CAGR of about 20%. If you do a simulation buying the top 20 stocks in the S&P 500 and holding them until they are no longer in the top 50, also rebalancing monthly, you also get a CAGR of about 20% over the last ten years, with about 1.15X turnover. If you run this on some of the bigger universes than the S&P 500, the results are generally better.

The ranking system is here: https://www.portfolio123.com/app/ranking-system/352291 and it’s public. Please note: you MUST use NAs neutral with this system because that’s the way I designed one of the factors (R&D to market cap). A lot of industries don’t report R&D expenses, and I don’t want them to suffer unduly because of that. The simulation is also public: https://www.portfolio123.com/port_summary.jsp?portid=1572963. Check out the holdings. This simulation has held Home Depot for almost eight years so far.

I’ll just explain a few things about the system.

There are four growth nodes, at 5% each, and four value nodes, also at 5% each. Most of the ranking depends on quality (40%), though I also have one size, one sentiment, and one technical node.

To measure growth, I take the current figure, subtract an equivalent from a year ago, and divide by the latter; however, I add “max(n, abs(” before the denominator to eliminate the chance of dividing by zero, a tiny number, or a negative number.

To calculate actual earnings, I subtract 80% of special items. If you don’t want to do this, you can eliminate that part of the formula.

I use different formulae for free cash flow and for enterprise value than the P123 standard. I like to include acquisitions and divestitures in free cash flow, and I like to give a minimum value to negative-EV companies. Also, all my value ratios use fully diluted shares. So you’ll see those differences in my formulae.

There’s one ratio here which is completely original to me, and which I believe nobody has ever used before. Can you guess which one it is? You may want to discard that one since no professors or investors have actually tested it. I like it, though.

I turned all the value ratios on their heads, with price in the denominator. You can reverse me if you’re used to doing things the other way. I like to think of value as a kind of yield.

I designed this as a basis for you to play around with. I think you could add a lot of other factors and get better and more robust results. You could also eliminate factors that you have no use for or think are just too weird. But maybe this is a good start in terms of how to think about evaluating large caps.

I’ve always believed that success comes to those who dare to be different. That’s what I’ve done with this system. But it’s not difference for difference’s sake. It’s difference because using tried and true metrics no longer works terribly well for companies that are analyzed to death, like those in the S&P 500.

This is actually a smart way of doing things. Price can never be negative whereas many other factors can be negative: earnings, book value, EV, etc and this can cause ranking problems if price is the numerator. As an example, if earnings are negative then the P/e rank will score really well the P/E Ratio comes out as a low value (below zero). If you put Price in the denominator, this anomaly doesn’t happen. Negative earnings will rank poorly.

Kurtis,

Your post just got me thinking about one of the most important, but never discussed, aspects of P123. It was just a chance to discuss this subject.

This is a copy of one of Olikea’s systems. For the last buckets things clearly have a nonlinear response.

So yes, increasing the number of stocks gives EXPONENTIALLY decreasing returns as you increase the number of stocks in your a model.

But the problem is more general. THIS CAUSES PROBLEMS IN EVERYTHING WE DO WE DO WITH MODELS LARGER THAT 5 STOCKS.

Anytime we talk about rank performance tests that look at all the stocks or or models greater than 5 stocks there is a BIAS in the model due to this nonlinearity.

What happens if you are using Olikeas’s ranking system and want to add an additional factor LINEAR factor to it? IT IS IMPOSSIBLE TO FIND ONE CONSTANT WEIGHT FOR THS NEW FACTOR THAT IS OPTIMAL FOR THE ENTIRE RANGE OF THE RANK PERFORMANCE TEST.

It is possible to find a weight that works for a small neighborhood—in the machine learning sense. A small neighborhood about the size of a 5 stock sim.

This is an important subject and was just using your post to draw some attention to it.

-Jim

Kurtis,

Your post just got me thinking about one of the most important, but never discussed, aspects of P123. It was just a chance to discuss this subject.

This is a copy of one of Olikea’s systems. For the last buckets things clearly have a nonlinear response.

So yes, increasing the number of stocks gives EXPONENTIALLY decreasing returns as you increase the number of stocks in your a model.

But the problem is more general. THIS CAUSES PROBLEMS IN EVERYTHING WE DO WE DO WITH MODELS LARGER THAT 5 STOCKS.

Anytime we talk about rank performance tests that look at all the stocks or or models greater than 5 stocks there is a BIAS in the model due to this nonlinearity.

What happens if you are using Olikeas’s ranking system and want to add an additional factor LINEAR factor to it? IT IS IMPOSSIBLE TO FIND ONE CONSTANT WEIGHT FOR THS NEW FACTOR THAT IS OPTIMAL FOR THE ENTIRE RANGE OF THE RANK PERFORMANCE TEST.

It is possible to find a weight that works for a small neighborhood—in the machine learning sense. A small neighborhood about the size of a 5 stock sim.

This is an important subject and was just using your post to draw some attention to it.

-Jim

Kurtis,

Your post just got me thinking about one of the most important, but never discussed, aspects of P123. It was just a chance to discuss this subject.

This is a copy of one of Olikea’s systems. For the last buckets things clearly have a nonlinear response.

So yes, increasing the number of stocks gives EXPONENTIALLY decreasing returns as you increase the number of stocks in your a model.

But the problem is more general. THIS CAUSES PROBLEMS IN EVERYTHING WE DO WE DO WITH MODELS LARGER THAT 5 STOCKS.

Anytime we talk about rank performance tests that look at all the stocks or or models greater than 5 stocks there is a BIAS in the model due to this nonlinearity.

What happens if you are using Olikeas’s ranking system and want to add an additional factor LINEAR factor to it? IT IS IMPOSSIBLE TO FIND ONE CONSTANT WEIGHT FOR THS NEW FACTOR THAT IS OPTIMAL FOR THE ENTIRE RANGE OF THE RANK PERFORMANCE TEST.

It is possible to find a weight that works for a small neighborhood—in the machine learning sense. A small neighborhood about the size of a 5 stock sim.

This is an important subject and was just using your post to draw some attention to it.

-Jim

Kurtis,

Your post just got me thinking about one of the most important, but never discussed, aspects of P123. It was just a chance to discuss this subject.

This is a copy of one of Olikea’s systems. For the last buckets things clearly have a nonlinear response.

So yes, increasing the number of stocks would give EXPONENTIALLY decreasing returns as the number of stocks is increased for a model using this ranking system.

But the problem is more general. THIS CAUSES PROBLEMS IN EVERYTHING WE DO WITH MODELS LARGER THAT 5 STOCKS.

Anytime we talk about rank performance tests that look at all the stocks or or models with greater than 5 stocks there is a BIAS in the model due to this nonlinearity.

What happens if you are using Olikeas’ ranking system and want to add an additional LINEAR factor to it? IT IS IMPOSSIBLE TO FIND ONE CONSTANT WEIGHT FOR THS NEW FACTOR THAT IS OPTIMAL FOR THE ENTIRE RANGE OF THE RANK PERFORMANCE TEST.

It is possible to find a weight that works for a small neighborhood—in the machine learning sense. A small neighborhood about the size of a 5 stock sim.

While I agree that looking at just a 5 stock anything is probably not a good idea there is an underlying reason for why we do it so often.

EDIT. I am temporarily unable to upload an image of a rank performance test on Olikea’s ranking system that is nonlinear in the response for the last buckets. I will try again later.

-Jim

Sorry about repeats. Looks every time I try to attach the image it repeats the post.

I will stop trying to add the image. And again I apologize.

Yuval,
I think your ranking system is good for large cap selection, but one can improve this model by taking advantage of the seasonal effect.
https://www.advisorperspectives.com/articles/2019/05/13/how-to-profit-from-the-seasonal-effect-in-equities?bt_ee=DaCgrF0XXgMPFm2Y8D5a4Y4%2FYVeS3GaXzeoKSKwwS2U%3D&bt_ts=1557829053764

Turn off all sell rules and rebalance every week.
Hedge with Consumer Staples XLP during the “bad-periods” May-Oct
Hedge on : end of April
Hedge off : end of October

This results holding 20 position for about 182 days over the “good-periods” from Nov-Apr, and no trading.
10-yr CAGR = 20.5% and low turnover with 80% winners, better than the 68% winners for the not-hedged version. Alpha is 8.37% versus 4.90% not-hedged.

For the last 10 years:
Total Return… 545.26%
Benchmark Return… 293.03% for SPY
Active Return… 252.24%
Annualized Return… 20.50%
Annual Turnover… 187.73%
Max Drawdown… -16.35%
Benchmark Max Drawdown… -19.35%
Overall Winners… (173/218) 79.36%
Sharpe Ratio… 1.54
Correlation with S&P 500 (SPY)… 0.77
Standard Deviation (%)… 12.28
Sharpe Ratio… 1.54
Sortino Ratio… 2.18
R-Squared… 0.60
Beta… 0.77
Alpha (%) (annualized)… 8.37

I know this is not a model for frequent traders because it only selects 20 stocks every year and never trades them, just sells them after 6 months and switches to XLP - so boring, but effective. It will make money for sure over the longer term.

Also check the long-term performance. Max D/D during the 2008 recession was -41%, versus 54% for not hedged, and CAGR is also 1% higher.

Summing up: An excellent ranking system.


To be any kind of a start in thinking about how to evaluate large caps, the volume and market cap lower-is-better tests have to be eliminated. And don’t substitute other backtest steroids such as sales lower is better, assets lower is better, etc. Those will inhibit your ability to learn to address large caps. I hate seeing factors like that. The only functions they serve are to hope the risk-on not-really-existent small cap effect are permanent and to juice backtest results.

If you legitimately want a small or micro cap strategy, set it via universe and/or screen/buy rules. If you knowingly choose to max out on risk, those factors are ok. But as part, even a small part, of an effort to learn to deal with large cap, those rules are absolutely unacceptable. It’s more than % of ranking system weights; it’s that they lock you into a mindset that says “I really don’t want anything to do with large caps but I’m dabbling in this exercise because somebody said I should.”

Don’t try to get an equity curve that looks like a 45 degree - plus angle with the benchmark as a nearly horizontal base. In fact, if your simulated alpha clocks in a more than 3%, go back and revise the system because there’s a good chance you’re data mining. And if you have a value-based model, look for negative alpha in the past three years and be very very wary of anything above 1%.