Systems Performance vs. Real-World Performance

This is an honest post to search for whether trading individual stocks is really worth it. I understand that individual models/simulations can post eye-popping returns, but I’m wondering if the average P123 user can consistently post market-beating returns across an entire portfolio, net of expenses.

I have been a P123 member for just over 5 years. While I do not design my own models, I am smart about how I invest/trade: I keep trading costs low; I subscribe to multiple models with proven track records and high scores in the momentum and value rankings; I stay committed to models (don’t back out when the going gets tough); I invest in a mix of large caps, mid caps, and small caps, etc. I have always subscribed to anywhere from 4-6 Designer Models (or R2G models in the old vernacular), and I follow the trade signals meticulously and in a timely fashion.

I have 15 years of investing experience, and I have chased many fads and follies in those 15 years. I’ve tried stocks, ETFs, mutual funds, leveraged ETFs, etc. But I continue to learn from my mistakes and become a wiser investor.

Here are some quick stats on Designer Models as of today:

– 23.6% of models have positive excess return over the past 3 months (69 of 292)
– 41.9% of models have positive excess return over the past 1 year (108 of 258)
– 42% of models have positive excess return over the past 2 years (87 of 207)

Those results don’t lead to very good odds of beating the market by using Designer Models. Further, these numbers exclude the countless models that have been retired over the past 2 years. If you were able to account for that survivorship bias, the percentage of models with excess returns would be lower. Clearly, it’s been a tough period for Designer Models. I wonder if it’s naive of me to think I am smart enough to find the minority of models that will outperform over the long-term.

What if we could see the since inception excess return of every model that lasted more than 6 months here on the platform? What percentage of them would have excess returns since inception?

My personal portfolio performance since joining the site is 11.65% annualized, net of trading costs/fees (time-weighted rate of return). However, during that same period (starting date of 2/11/13), the S&P 500 Total Return Index is up 14.65% annualized. The MSCI EAFE Index is up just 5.72% annualized during that period, which certainly has been a headwind for my returns. My international exposure has likely ranged from 15-25% of the portfolio during this period. But even a portfolio of 80% SPY and 20% EFA would have beaten my performance by over 100 basis points annually over the past 5+ years. So is it worth all my time and energy in researching and trading models, only to trail the market?

Year-to-date, I am down 3.24%, while the S&P 500 Total Return is up 9.27% through yesterday. I have always been drawn more to value strategies than growth strategies, so it’s understandable that my personal portfolio would struggle as value continues to lag growth over the past number of years. I’m trying to ascertain whether the ship will right itself, or if I need a change of strategy.

As I see it, there are several challenges facing the individual investor as he attempts to beat the market:

  1. We can all find great models/backtests/ideas. But how successfully do we put them together to form a portfolio that can beat the market over time?

  2. Position sizing and model sizing. The temptation is to size up on the positions in our “hot models” and size down on the “cold models”. Further, the temptation is to commit more capital to the hot models and less capital to the cold models.

  3. If a model has “gone cold”, how do I know whether it will come back? I have seen many great models from reputable designers be sent to the trash bin due to lousy performance. But the models always looked good…until they didn’t.

  4. A lot of the microcap/small-cap models on here seem to be chasing the same collection of stocks. Thus, for me it seems like a crap-shoot from one model to another. One model just happened to enter and exit at an opportune time, while another model had different timing.

I just finished reading Jack Bogle’s book, “The Little Book of Common Sense Investing”. I read it with an open mind, and I came away mostly persuaded in the truth of Bogle’s arguments. I understand a guy like Bogle is probably anathema in the P123 world, but his arguments (and his success) merit attention. I am considering a rebuild my portfolio with a simple mix of Vanguard Total Stock Market Index, a mid-cap ETF like EZM, a small-cap ETF like EES, and Vanguard Total International Stock Index for something like 10-20% of my portfolio. I would be getting off the hamster wheel of trying to beat the market.

What length of time is a sufficient testing period for my life on P123? Am I giving up too soon on trading individual stocks?

1 Like

The big danger to many designer models (which typically focus on small, mid and smaller large cap stocks with a value bent) is a topping environment where momentum in a small subset of very large stocks leads the market up, and the rank & file do not particpate much in the final gains of the 9-year bull.

Designers will need to rotate into momentum, mega cap and certain sector-specific models.

The S&P SmallCap 600 Index is up nearly 16% YTD (as measured by the ETF IJR). So there are still great returns to be had outside of a few large cap stocks.

The market will always be led up (and led down) by a small subset of very large stocks, since the market is cap-weighted. By owning the total market, you ensure you will own whatever small subset of large caps are driving the market higher.

Did not mean to suggest we were topping now.

But if you look at the late 1990s and 2007, the advance-decline line was weakening into those tops. Meaning fewer and fewer stocks were participating.

The Russell was up 16% in 1998-99 compared to 41% for the Dow.

The Russell was down 2% in 2007 while the Dow was up over 6%.

And the largest NASDAQ 100 momentum stocks crushed the Dow in 1998-99 as well as 2007.

All I am saying is that it can get very tricky for model designers in topping patterns where mega cap stocks are among a dwindling number of leading performers.

I have been outperforming net of expenses ever since I started using ranking systems to choose stocks based on Portfolio123. The CAGR on my entire portfolio since 10/29/2015 is 45.77%. The returns have been consistent: 45% in 2016, 58% in 2017, 31% so far this year. All my designer models (I have six of them) are beating their benchmarks. In my opinion, stock picking, if done wisely, is more financially rewarding than indexing.

To respond to the specific points you make, I believe that investing in a lot of different portfolios is a mistake, especially when you chase their performance over the recent past. The best way to invest is to put all your money into a portfolio that is extremely well thought-out, thoroughly backtested using lots of stocks over a long period of time, has extremely low risk measures, is well diversified, and gets minor tweaks from time to time as new factors or ideas come into view. A system that takes different portfolios and averages them all to come out with one optimal portfolio will perform far better than allocating your money among various portfolios, just as a system that effectively combines value, growth, quality, and size will perform far better than a system that allots one-quarter of its portfolio to each of those. Chasing recent performance is doom-laden. Look at the long term.

1 Like

Yuval, thanks for your thoughts. I do agree with your point about having a model with more holdings increases it’s likelihood of performing more consistently well.

Having said that, none of us know what any given model’s performance will be over the next 12 months. So while your models have been performing admirably as of late, as an ordinary user of P123, what would give me the confidence that your models are more likely to perform well over the next 12 months, compared to other designers’ models? No one designs a model that hasn’t performed exceptionally in the backtest, nor do they compile subpar ranking systems. And yet, over the past two years, it’s worse than a coin flip that any given designer model here on the site has generated excess returns.

So let’s say I go “all-in” on a single model with 20 or 25 stocks. What are my odds that model will out-perform over a period of several years? And of course I agree that performance-chasing is doom-laden. But with individual stock models (vs. passive, index investing), it’s enormously hard to stay with a strategy when it seems to be “broken.”

And you must be humble enough to recognize that many of the designers building these models were as competent as you, and just as confident in the future performance of their model.

I guess I’m just pointing out how difficult it is to actually achieve stellar long-term returns.

1 Like

Let’s test out Yuval’s theory that owning a great model with a high number of holdings will yield good results. I screened for models with the following characteristics:

– out-of-sample for at least 2 years
– # holdings: 16 or more
– value score > 70
– momentum score > 70
– quality score > 70

There are 20 Designer Models that match those criteria. Of those 20 models, only 5 of them have positive excess returns over the past 2 years.

If I remove the quality score filter from above, we have 30 models that meet the screening requirements. Of those 30 models, just 9 have positive excess returns. Those are pretty long odds against me picking just one model that will deliver the goods. Hence, that’s why I have chosen to diversify across multiple Designer Models. However, I do think it simply waters down my returns. It mitigates my drawdowns, but limits my upside.

Also, I should again mention this analysis does not factor the dozens (hundreds?) of Designer Models that are in the graveyard due to poor performance.

1 Like

Most of the designer models have serious problems in term of real world performance due to a number of factors. The biggest issues in my opinion are the following:
-Almost all require weekly re-balancing and have massive turnover. Transaction costs and not being able to keep up kills you here. Try getting in and out of 25 stocks every Monday at 9:30.
-Most are the product of data mining or lack any real out of sample testing. When you start your backtest can have a huge impact on performance.
-The universes include tons of stocks that are nano cap and not liquid. I have gotten crushed trying to move even as little as $20k out of some of these names.
-None of the systems that I have seen allow for any kind of rotation (i.e. market cap, industry, cyclicals).

This is why I don’t subscribe to any of the designer models. Instead I have 10 large cap and 20 small cap screens that I compile in Excel and use momentum metrics to rotate strategies. No system beats the market over every window. The average annualized return of these 30 screens out of sample since 04 is 35% for micros and 25% for large. S&P at 8%. Sounds great, right? But let’s look at it with your windows.

For the 30 4 week screens that I track (outperformance %):
L3M - 53%
L6M - 50%
L12M - 63%
L24M - 76% (a few don’t outperform significantly, so closer to 66%)

Better, but not that great. The longer you look out the better it gets, but in the short run its a coin flip.

Same data set with rotation, only picking the top 3 based on history available (outperformance):
L3M - 66%
L6M - 76%
L12M - 77%
L24M - 80%

Even with a mix of large and micro, the top 3 returned 45% annually. My advice, is build your own to suit your needs. Over the long run, you will do much better.

Mike - there is no rational reason for selecting models with value score > 70, momentum score > 70 and quality score > 70. These scores are just an indicator of style, not scoring past performance and certainly not predictors of future performance. They are also not perfect as style indicators. If you are going to construct a portfolio of Designer Models, you should probably be equally distributed between DMs that dominate on one style factor plus(+) DMs that don’t have a dominant style factor. Going for all three will bring out mediocracy.

The other point I would like to make is that larger number of holdings may result in more consistency but it may be more consistent in the wrong direction. In my opinion, it is better to choose focused 5 stock DMs that don’t overlap with one another in “smart niches” based on macro-trends, rather than investing in multiple large portfolios (16+ stocks) which are all chasing the same alpha and have a great deal of overlap with one another.

Take care

“I understand a guy like Bogle is probably anathema in the P123 world, but his arguments (and his success) merit attention”

I think the typical P123 user is going to be someone who is at the very least skeptical of the efficient market hypothesis. But when we go outside our little circle, it is a pretty well accepted fact among the greater majority of financial practitioners. There’s not really not much disagreement as to whether market efficiency exists, but rather how and at what rate it incorporates new information.

I think the combination of market mechanisms (i.e., traditional ones like barriers to and drivers of arbitrage as well as non-traditional ones) explains how the majority of models underperform the market and also when why some can consistently outperform. For example, I think that models which focus on exploiting the barriers to arbitrage are more likely to boast robust OOS returns. Likewise, the market adapts against excess returns by punishing populist investment themes.

So, to answer your question: Yes, on average, active management is losers’ game. But on the right side of the distribution, the tight clustering of outperformers indicates to me that skill actually exists. The trick then is to either develop this skill oneself or become adept at identifying it. Luck doesn’t hurt either.

To a degree. Backtesting a model with lots of holdings and then slimming it down to 15 to 25 stocks for your actual portfolio is a pretty good way to go. Andreas, on the other hand, has a portfolio of 100 stocks and uses a modest amount of leverage, and he’s done extraordinarily well.

Well, you could consider the fact that six out of my six designer models have outperformed in real time and my own portfolio has outperformed too, or you could read the writing and research that I publish on my blog and on Seeking Alpha. Before investing in anything, you have to do due diligence. You have to ask lots of questions. You have to satisfy yourself that the designer can deliver the goods and has a plan that’s relatively risk-free. Whenever I buy anything expensive–whether it’s a stock or a car–I try to look at it from as many angles as I can. If I’m buying a used car, I’m going to ask the dealer all kinds of questions. If I put my money in a mutual fund, I want to know that fund’s overall strategy–not just its track record, but how and why the managers choose the stocks they choose. The same goes for a designer model. You could probably look at the descriptions of some of these designer models and know you wouldn’t invest even a dollar in them, regardless of their returns. And you could look at other descriptions and get intrigued and want to dig further.

I agree with you 100% that “the tight clustering of outperformers indicates that skill actually exists.” But I don’t see how that’s at all compatible with the EMH, or how the EMH can be a “pretty well accepted fact” if a skilled investor can outperform.

I have an inefficient market hypothesis. As I wrote last year, “The stock market is a badly oiled contraption, stuck together with cellophane tape and staples, and full of rust spots and leaks and broken parts. Its pricing mechanism is terrible and inefficient, and it runs a crazy, circuitous, and illogical course. Why? Because many of the price movements in stocks are based on the quirks of human behavior rather than on sound financial sense. . . . The reason movements in stock prices are so confoundedly difficult to predict is not because they’re efficient and reflect all known information, but because they are the products of thousands of small overreactions, unfulfilled expectations, and misguided second guesses. Rather than cancelling each other out, as Fama suggests, these behavioral quirks make stock prices far more volatile than they should be, and make them move in entirely unexpected directions. This is where the savvy investor gets an advantage. By looking at the general principles behind this machine, she can see through all the smoke, hear through all the noise, and figure out approximately what certain stocks are doing and what they’re likely to do in the future.” (See . . .)

"Because many of the price movements in stocks are based on the quirks of human behavior rather than on sound financial sense. . . . The reason movements in stock prices are so confoundedly difficult to predict is not because they’re efficient and reflect all known information, but because they are the products of thousands of small overreactions, unfulfilled expectations, and misguided second guesses. "

Well said, you hit the nail on the head!

I do not believe I can offer much as far as specific recommendations in this regard but I do think this is probably very important for a few reasons:

  1. This turns out to be an important part of that academic argument for the efficient market hypothesis.

  2. I think options–as method of arbitrage–are often not available for many of the small-cap stocks that are in our models

  3. The posts suggest that shorts are difficult, expensive or impossible to get for many of the stocks traded on P123. Someone who actually trades some shorts may want to correct me on this.

  4. Statistical arbitrage depends on a lot of information. This information is limited for us. One reason being that there may be fewer analysts. But generally, the money for small-cap research is not there to warrant the amount of research that allows for statistical arbitrage with larger-caps.

  5. Is no dividend for the shorts on stocks that pay dividends important? Not sure on this. And I will lump in that shorts are risky—more so for those who do not have whole departments of MBAs helping to control the risks.

Anyway, I have been thinking about this and I think it is probably an important factor. And of course, Wesley Gray keeps pounding this drum. But then again, he probably had to find a reason why his mentors were “mostly correct” about the EMH when he got his Ph.D.


Michael, I feel compelled to say something because I share your concerns and have my own P123 experiences to relate. I am not into designing models for others to use but do design my own, so far. I have also been using P123 for a bit over 5 years. I have been a market participant for over 40 years so also have chased many fallacies.

…but I’m wondering if the average P123 user can consistently post market-beating returns across an entire portfolio, net of expenses
[/quote] I constantly ask myself if I still have confidence in the ports I designed and use. Currently the answer is a qualified yes. I have lost confidence in them twice, and subsequently redesigned the ranking system and buy/sell rules. They seem to be working decently now. I would probably need another lifetime to confirm consistency. Instead of beating RSP every year as I want, I may have to settle for beating it the majority of years and in total.

If I had a “very trusted” timing trigger to rely on, I would likely drop my stock models, split my stock money between ETFs, perhaps RSP and QQQ, and rotate out of the market as needed. I would probably still use P123 to analyze and maintain that timing trigger. Alternatively I might decide on an etf universe rotation model. But I don’t have that so the best I currently hope for is the same as you. I am just attacking the problem differently by designing my own models.

Can an average P123 user consistently beat the market? I prefer to ask what I can do to beat the average P123 user and market participant! If you are not satisfied with the results of using designer models, I recommend trying to design some yourself. You will at least get a much better feel for the real effects of individual variables on stock results and the value, or lack of value, of different approaches.

I’ve been a member for over four years. During the first three and a half years I used Designer models and over that time I outperformed the market by about 10% per year on the average. Some years were better than others. I switched models several times but ran one of them the full time. I tried to have a balance between small, medium, and large cap models. I did spend a lot of time evaluating the models to decide which of them to use which helped. It’s very true that there are a lot of designer models that do not perform well.
Over the years I spent a lot of time learning how to develop my own models and have been using them since February of this year. I run 5 models each with 10 stocks for a total of 50 stocks. So far I’m beating the indexes by a significant amount in all five of the models and in total. I definitely agree with the others that it’s best to design your own models. I’ve found through extensive testing that mid to large cap stocks(Market cap of $1B to $35B) provide opportunity and a little more stability than the small caps. I monitor the best fundamental variables that I’m using in my models to see that they are consistently performing. I’m sure I’ll need to change the models as the market changes going forward. One other thing, I chose not to focus on the ranking system. I’ve used Frank on most of the variables and select ranges of the fundamental variables to create my models. I rebalance every four weeks so they are relatively low turnover.

It’s not just the average P123 investor, the industry titans are having a rough go recent times. . Value is doing awful and momentum and quality aren’t picking up the slack. If you’re outperforming now, celebrate.

I stopped trading other people’s models and now only trade my own models, because:

A) It is hard to trust someone else’s black box during inevitable periods of under-performance.

B) If you design and trade your own models, you learn a lot by looking at live performance.

C) I don’t have to compete with other subscribers. For my own models, I consistently outperform the P123 equity curve. When I was trading other people’s models, I either consistently outperform or underperform, depending on the model. This probably only applies to small and micro caps.

As for if you should actively invest or passively index, this is very hard to answer. You just don’t know what will happen until it happens. At some point you just need to have confidence that your stock picking process is justified by some fundamental justification, and that it ought to outperform a simple index, even though it will probably under-perform for significant periods.

Hi Eric,

Thanks for sharing what is, for me, a new approach. I’m still trying to get my head around it. I’m I correct that the majority of your method consists of Frank > x as buy rules? Since a Ranking is still need to pick the top stocks that pass all the Frank buy rules, is the Ranking system related to one or more of the Frank factors or is it something completely different?


Hi Mike:

I sympathize with the challenge you’ve encountered finding models you like. In addition to the tips others have given about studying the designer’s descriptions of the methods, I also like to get a handle on the track record of the individual designers.

Here’s how I did this most recently (beginning of 2018).

I downloaded the full designer model data set from P1213 into Excel
I highlighted the column for 1 year returns (ie 2017 returns)
I highlighted the column for 2 year returns (ie 2016 and 2017 returns)
I created and highlighted a column for 2016 results (2 year return - 1 year return, a rough estimate for 2016)
Since I also have 3 live models of my own (real results not back tests), I added these to the designer models list.

Then sorted by Designer.

I created 3 new columns in Excel.
. AVERAGE 2016 returns (which was the average returns for ALL the models by a designer)
. AVERAGE 2017 returns (which was the average returns for ALL the models by a designer)
. AVERAGE 2016-2017 returns (which was the average returns for ALL the models by a designer)
I had to calculate these averages manually because I don’t know how to write Excel Scripts.
If a designer had fewer than 3 models (with at least 1 year history), the designer did NOT get a score.

Then I sorted by AVERAGE 2016-2017 results.
I only considered models by top 5 designers (actually only 4 designers since I ended up being in the top 5)

Of these models, I rejected any that did not have at least 2 years of history.
Some models largely duplicated my own, so they got excluded.
Some models did great in 1 year but not the other, so they got excluded.
Some models were fully subscribed so they ended up on a watch list (got one of them after a couple months of waiting).

One of these designers got excluded because 2 of his 3 of his models were loosers for 2016-2017. His other one was big winner, so big that his average score put him in the top 5 designers. He has 5 more models with less than 1 year’s history so next year I’ll have 8 to use to evaluate him and then things might look better. But for this year I passed on his one good model because it just might be luck. Time will tell.

I ended up signing up for a total of 3 models (2 from one designer and 1 from another).
So far I’ve not put money into any of them. I subscribed not to immediately trade them but to have the option of trading them in the future, if they prove themselves over the next year or two. If I haven’t designed the model myself, I like to have several years for history to build my confidence.

By the way, none of the 3 has done very well since I subscribed (two are flat and one has lost money).
So my approach doesn’t guarantee every model will be a winner.


PS: To get a handle on the bias you mentioned (designer’s deleting their poor models), I plan to keep my Excel sheet so I can refer back to see if a designer has deleted models.