How I Rank R2G Ports

rallan - thank you for this knowledgeable post.

I have some experience with neural nets (NNs) but probably not as much as you. I have a couple of comments…

  • While choosing too few or too many nodes will lead to poor results, choosing the “ideal” number will not give OOS results that are as good as backtest.
  • OOS results will degrade with time. A “rule of thumb” is one year OOS for every 4 years of in-sample data optimization. Beyond that, one is pushing one’s luck. So if a model was specifically optimized over the last 5 years then one could expect it would continue to “work” for about 1.25 years assuming no regime change (such as fast dropping oil prices).
  • Nodes are internal to the NN. The “ideal” number of nodes is directly related to the number of inputs. We don’t have an equivalent to “nodes” in our ports.
  • Ranking factors are “inputs” not “nodes” and the quantity of factors should not be judged by the same criteria as for nodes.
  • NNs are only as good as the inputs. i.e. garbage in, garbage out. This is why I gave up on NNs. If you can identify good inputs then why do you need the NN? :slight_smile:

I think that one of the problems you and others have is that you believe that there should be a strong relationship between in-sample and out-of-sample results. Some model providers may design with this assumption in mind. But to make this general conclusion is wrong. If you truly like Marc Gerstein’s results then listen to what he has to say about backtest :slight_smile: It is not outside the scope of this thread.

As for how you choose R2G ports, it looks to me as if you are choosing models that are performing the best. This can very easily lead to buy high/sell low. Just as an example, many of the smallcap models that performed exceptionally well the first six months after R2G started, subsequently flopped.

Steve

Further to what is being said, it seems that the Piotroski model has failed in 2014 when it did well earlier. I know my two are doing poorly, the R2Gs are not performing and neither is AAIIs. Can overfitting be also a general model design theory? Like factors based upon management effectiveness? Or concentration on a sector like health care or focusing on high dividend stocks in a low interest environment? Since we cannot see what is inside of R2Gs, it may be selecting stocks on macro trends that have worked well over the last 5 years but now the general market circumstances have changed. I personally think that Piotroski failed because it led to an over reliance on energy stocks. Does that mean that his concept was bad or just the types of stocks selected moved against it. Could be the same of over reliance of selecting small cap versus large cap stocks. Small cap were hurt this last year too. I guess I am saying that the fundamental ideas in the model could be fundamentally effective over the last 15 years, on average, but failed in 2014, screwing up the OOS results. I think just looking at 2014 could be misleading. I think developers do need to talk about more what are the drivers for the market.

David,

I agree with your general point, but I’m also laughing because one of my top live money systems this year is a Piotroski variant. My second one is a microcap system. They’ve done much better than the ‘safer systems’ I launched as R2G’s. And much better then their indexes. My worst performing system is an SP500 focused system - and that had the best underlying index performance and chance of doing well this year.

@Rallan,
Thanks for the long reply. Good luck. Over the years, I’ve come to the conclusion that I have close to ZERO idea how any pro manager I choose, or model I invest in will do in the next 12 months. I can choose managers (and systems) I will exclude. Those rules are easier. And I’ve vetted at least 500 pro managers in the alternative space and invested in a handful (but tracked dozens - over more than a decade). What I do (sometimes) know is the general conditions in which a system will struggle and a likely range of outcomes I am willing to accept. All I can do is try to build a basket of systems for myself where I understand how they fit together… and then monitor systems / managers to see that they stay within a ‘range’ of expected performance.

If someone wanted to chase hi-performance in microcaps through a basket of such R2G’s (I stopped offering these because there are a lot, and if I really believe in a system I don’t want to compete with sub’s, but this is the dominant R2G money maker), that would be an okay approach. But, I would expect that most of them would return around the bench not counting fees. Some number (maybe 20%) would lose a decent amount of money and some number (hopefully 10-20%) would have big up years. So, in total you could make money over the bench. But, I would not expect them to have much stability year-to-year in terms of total rankings (unless liquidity is really low, i.e. under $300k ADT100, then it might). The issue is the fees relative to the amount you can invest in them. There is money to be made here, though. But I don’t think it will come by timing allocations. I think it will come by recognizing that the systems will, nearly all, underperform backtest results over a rolling 3 year period… but looking for systems that fit together (whatever that means to you)… or get you to a realistic number of holdings (say 30).

My bet is also that there are some very good R2G designers - better than most mutual fund managers, and better, in some cases, then ‘big name’ hedge funds. At least after fees. These are people who will give an ‘average’ P123 sub a better chance of beating the market then if they do things on their own. But, only if the fees make sense relative to dollars invested. And only if people can find them. And build a well constructed portfolio ‘blend’ of them. That’s very hard also for many weekend investors.

I can’t disagree with you waiting to see on any of my systems. But, if they do a 20% benchmark outperformance (or underperformance) year, it is my belief that they are no more likely to do well (or poorly) the following year. At least that’s my experience.

For ‘fun’, I made a little deck to reflect on this:
https://docs.google.com/presentation/d/1ijxx3csXW-6USn5NOGqHG08NgBim448St_8_fHhwWpA/pub?start=false&loop=false&delayms=3000

I still think the hardest issues for anyone, whether we built the system or not, is predicting it’s forward 1-2 year performance. I wish I could pick the ‘forward year’ winners from among my own models. I can’t.

Best,
Tom

Steve,

That’s usually true. But as long as the OOS outperforms the benchmark in real time, I would be happy with such a BPNN. In his thesis paper, Vanstone was able to get annualized returns of 30% from the Australian stock market with BPNNs. And this was on the test sets, not just the training sets. He was a beginner, using just Benjamin Graham rules, with an implementation that in my view tapped into only a fraction of the potential of NNs and AI, yet he did quite well on his first attempt!

I agree. But this can be dealt with. A moot point as P123 does not incorporate this form of AI.

I really didn’t want to get into a technical discussion of BPNN, these were to serve as an example of AI tackling the issue of overfitting. The input layer takes in the first row of the raw data, i.e; price, earnings, previous price(s), volume, etc. The weights to-and-from the middle layer of nodes is the AI equivalent of human rules, in an abstract way. BPNNs, and most types of NNs, can handle data non-linearly, P123 is strictly linear.

Hhhhmmmm. An answer to this gets too long and technical, and is outside the scope of the thread. Read Vanstone’s thesis.

This is patronizing me, I hope this is not what you meant to do. In all cases the in-sample data should provide some useful information to help judge what future results should be, otherwise why would P123 provide in-sample data as it now does?

If this were true I would be anxious to subscribe to the anonymous R2G port with a Sortino over 8, described above by me. As I said, I doubt I ever will.

Randy

Tom,

I feel sometimes I should just put the list of open R2G ports on the wall, have a monkey throw five darts at it, then pick the five it hits. Then sell the monkey.

Randy

Seems like I set off an avalanche when reviving the thread after it lay dormant for a year.

Good discussions here.

It’s important to remember the the premises of working with P123.
We are restricted to a 16 year timeframe, heavily exposed to in-sample performance stats and flooded with options to choose from to increase (in-sample) performance.
The platform delivers outstanding possibilities compared to the options of the average investor and still holds as a dedicated approach compared to how professionals pick stocks.

However, the limitations and the design strongly support in-sample overfitting.
Try designing your system, use backtesting runs only to check for unintended errors, convert it to a live port and check back after the next peak-to-peak market cycle.

Unfortunately we don’t have that kind of time and are generally eager to invest, so we prefer betting on any possible shred of outperformance that we feel confident with.

I believe that there are too many variables in the short-run that make reliable stock predictions impossible.
There are too many macro trends, political turmoil, social and environmental challenges that no one can accurately predict.

I think that if you turn off the noise, investing in the stock market is no different to trading goods a few thousand years ago.
It is a game of having more or better information or taking advantage of the ‘animal spirit’ (Keynes).
Access to privileged information still seems to have a slight positive effect, but more striking might be to exercise patience and persistence in your core strategy.
Daniel Kahneman’s “Thinking, Fast and Slow” provides an excellent recap on the findings in psychology of the last decades.
I think a basic understanding on human nature helps us to make better choices both as a personal investor (for example thinking about ‘framing’ in terms of the data presented here) and in building models (know why strategies work and not run away from them if they underperform for a few years).

Of the R2Gs that have been launched for at least one year only a little more than half have beaten the benchmark - keeping in mind that most benchmarks might not be appropriate and that this includes only performance in a generally friendly market environment. Risk-adjusted Sharpe or Sortino-ratios over an out-of-sample peak-to-peak market cycle might come to a different conclusion.

That said, and without finger pointing and by including myself, I think that most of what we observe (in terms of evaluating R2G performance) is sheer chance.

Randy, I might want to borrow your monkey for now after he has thrown your darts.

Best,
fips

rallan - I “played” with Neural Nets for several years back in the early 1990s, specifically for the financial markets. By the end of the '90s the entire financial industry had pretty much given up on them simply because there is no visibility into the underlying (abstract) “rules”. I’m sure there are people such as Vanstone who try to breathe life back into this area but I believe that it is pretty much dead for the reason I mentioned above.

I have not read Vanstone’s thesis and I don’t plan to, the reason is personal (has to do with academics in general). However, demonstrating on “test sets” is not what I consider to be evidence that something is a valid approach. Test sets are generally small and are subject to the same biases as any optimization technique.

When I originally became attracted to Neural Nets, it was due to the promise of “artificial intelligence”, that I could feed the net a bunch of technical analysis indicators and the net would discover relationships/patterns etc and make intelligent predictions for the future, all with a keystroke. But this of course is delusion.

It was you who brought up neural nets. Anyway, quantitative analysis, as far as choosing inputs by test is concerned, tends to follow the same sort of decay with time. The exception of course are those factors that are based on mathematics directly tied to the company’s books. They should work forever presumably.

As I stated earlier, P123 ranking factors are not the equivalent of NN “inner layer nodes”. Ranking factors are not modified by feedback systems. The are the equivalent of NN “inputs”. So long as the ranking factors are “valid” and have some predictive value then there is no restriction on the quantity of factors used. In fact the more unique inputs the better.

P123 buy/sell rules are also not the equivalent of NN inner layer nodes. If they were then the number of buy/sell rules would bear some relationship to the number of ranking factors. 2:1 inputs:nodes tends to be optimal for financial NNs. I am however a minimalist when it comes to buy/sell rules. Zero is optimum.

Marc G. can correct me if I am mistaken, but I believe he said that there is a short period of time in 2013 (6 months?) that is relevant for backtest. i.e. the period of rising interest rates. The rest of the 14 years of data, although interesting, is not particularly useful.

Others, design models for optimal results over 14 years of data, previous 5 years of data, through two bear markets, whatever. This does not mean that such optimization is wrong, nor should the entire 14 years of backtest data be considered representative of how the model’s future performance should look. Backtest is a tool, nothing more.

As to why P123 provides in-sample data as it now does has been deliberated since the start of R2G. I have been pretty vocal about this issue since the start. In-sample is best case, there isn’t a soul out there who isn’t over-optimizing except for Marc G. (and possibly SZ.) As Marc works for P123, he is providing free models, without the pressures that the rest of us face, trying to “outdo” the rest with in-sample data.

Steve

It doesn’t provide a lot of useful information on what future results should be. As it is currently in R2G, it’s a marketing tool, which is fine.

Backtests are useful for a designer who gets the whole picture by running dozen of them during the development process.

Judging a strategy based on a single backtest is like judging a girl (or boy because this is 2014) based on a single picture. It’s risky. That one picture makes for a very partial representation.
Yet can we blame designers for showing to the world their best backtest?

Anyhow, I think the consensus is here among the R2G users that backtests should not be presented at all. I think strategies would market themselves just fine based on post-launch performance only.
P123, R2G users and designers have everything to gain from this. It would make us look more professionnal for one thing and avoid the influx of clients towards strategies who have yet to prove their worth out-of-sample.
I wish it would eventually happen.

With all backtests/sims, it’s important to understand what was going on in the world/market at different times and recognize which sub-sets of the test match best/worst with your expectations going forward. Although interest rates remain very low, I think at this time we do need to think about how things might look for our strategies in a rising interest-rate environment. Unfortunately, life has been stingy in terms of giving us good sample periods. Mid 2013 may be the best we have. So it’s an important one to consider, especially for income models (for these mid 2013 might be the only useful test period).

But we need to be aware of other issues, too.

For example, I want no part of 1999-2002 testing because that period, I believe, was unique given the nature of the dot-com bubble and crash. In fact, inclusion of it in tests poses great danger of trapping us into kidding ourselves. One of the oddities of that period was the way a narrow group of stocks collapsed spectacularly (and pulled market cap weighted indexes down with them) while many of the huge umber of stocks outside that group rose or fell just modestly. So many non-dot-com p123 strategies build up a lot of alpha during that period. But more typically, and as we’ve seen with later downturns, the bear is more inclusive (in fact, if anything, correlations seem to be higher now than in the past). So pre-2003 test results, while making for a better feel-good experience than fine wine, serve mainly to distort the merits of the strategy.

I also “forgive” models for just about anything that happened in the 2008 crash. While we ca easily apply 20-20 hindsight to build market timing rules that allow us to fantasize about our ability to avoid large “drawdowns,” the reality is that absent hindsight, there’s little we can do to protect ourselves from epic financial meltdowns where the only truly useful fundamental test would be one that could answer the questions: “Who owns the stock and how desperate are they to raise cash?” Ironically, to the extent that stocks weren’t all pretty much in lockstep in 2008, what variation we did see (very little variation) caused better stocks to underperform because those were the ones for which spinning-out-of-control funds could get legitimate bids.

I know I’m in an awkward position when I talk against over-optimizing; as Steve says, I’m with p123 so the only models I put up have been freebies to help launch the site and,one could say, don’t need to attract subscribers. But my situation aside, Father Time is the penultimate judge/evaluator. And of you can’t generate out-of-sample performance, Father Time is going to turn thumbs down. R2G had a charmed life early on, when there was little out-of-sample data to look at. Those days are gone and are no more likely to return than might be dial-up AOL’s stature as king of the internet.

1.)Marc what interest rates are you referring too? Perhaps the period mid 2012 to end of 2013 when long bond (20-yr) yields went from 2.1% to 3.6% should be considered a period of rising interest rates. So perhaps one should look at R2G performance over this time.

2.)Assuming that all market timing rules are nonsense then what have we got left? The only stocks that will be less punished in a down-market are minimum volatility stocks and dividend paying stocks. I am running out-of sample models since July 2014 using as universe the holdings of the minimum volatility ETF USMV. Results so far are very good. You can read about the method here: http://www.advisorperspectives.com/dshort/guest/Georg-Vrba-140627-Minimum-Volatility-Stocks.php and a follow up report on the Trader model here: Minimum Volatility Stocks: iM’s Best12(USMV)-Trader | iMarketSignals
Here is the out-of-sample up-to-date (12-8-2014) chart of the Trader model:http://imarketsignals.com/wp-content/uploads/2014/12/Fig-7.1.USMVtrade-12-9-2014.png. To the best of my knowledge there is not a single R2G model that matches these returns.

So my philosophy is simple. Why would I be able to select a better minimum volatility stock universe than the professionals at iShares? I simply replace the universe of my Best12(USMV)-Trader model every 3 month to make it current with the stock-holdings of USMV.

However, these models can not be offered as R2Gs because they do not comply with current rules. So perhaps there is something wrong with rules that require long backtests but no out-of-sample performance data. I will report back in 6 months again when the out of sample period for the Trader is 1 year long.

Georg

Georg - I like your idea but couldn’t pass up the opportunity for self-promotion. There is one system (at least) with better record.
Steve


Nice stats for both models, Georg’s looks smoother, Steve’s has the higher absolute return.

However, there are a few models that match that return.
But I think it’s careless to compare based on a six month period and by the absolute return only.
We would need the oos Sharpes, Sortinos and longer oos timeframes.

The low volatility anomaly has been around for some time. It’s one of many helpful anomalies.

Georg, why shouldn’t you be able to construct a better minimum volatility stock universe than iShares? They have more resources and suck up more Ivy League students than you probably have at your disposal, but they are mostly prone to the same biases and maybe face different limitations.
One of the reasons I work with P123 is that I want to know what’s inside the box. It would be nice nonetheless to know what they are screening for.

As Marc says, Father time will teach us.

Best,
fips

Steve, your Micro-Cap USA looks great. But very few people can actually trade this model and benefit from it because of liquidity constraints.

Fips, I agree that the oos period for the USMV-Trader is still too short. I am doing an experiment and am not claiming that the Trader will always outperform the S&P500. But remember that it only trades large-caps, and has no liquidity problems. So perhaps one must compare its performance with other large cap models. Also the USMV ETF follows the MSCI USA Minimum Volatility Index which is not described accurately of how it is determined, and therefore cannot be replicated. You can buy the screening parameters from MSCI if you have the money.

Best,
Georg

Georg,
You might check on how R2G prices are calculated. I think, unlike private ports, they are adjusted by a formula for Monday’s prices.

Marco posted on March 31, 2014:

“The price for the transaction will be today’s (Hi + Lo + 2*Close)/4 +or- slippage. The slippage is calculated as a per share amount using the variable slippage algorithm, and it’s either added to the price for buys&covers, or subtracted form the price for sells&shorts.”

Geov,

Your idea of creating systems based on Smart Universes that are pulled from well designed ETF’s is very worthwhile. There are many ETF’s that do things that P123 simply can’t. Complex factor based minimum variance optimization with a host of underlying constraints based on data from hundreds of sources, is clearly not something P123 can (or likely ever will be able to) replicate in terms of universe creation.

I never bothered with these on my own, because they are too much work to execute and test, and there is so much P123 can do, so I’ve stuck with rules based universes, of which there are an infinite number.

But, I’d love it if you could share the ‘backtest list universe’ for P123, so I can build some systems for personal use on it.

Fips, Geov may be able to create a ‘minimum variance’ universe using beta and financial quality and defensive sectors, but it will be a very different universe (and process) from the one that MSCI / Barra are using in this optimization process. They have very different data sets (hundreds of providers)… and a very different optimization process. Sampling 20-30 stocks based on a fundamental ranking, from the minimum variance universe they have created (with decades of risk management and optimization under their belts and large nations as clients), is a very valid approach for all P123 traders. And it’s very smart. There are many ETF’s that use underlying universe creation methodologies we can’t look at (for example 13F replication of hedge funds with longest holding periods and ‘clonability’ index).

As to whether or not these should be R2G’s, I am not wading into that debate. But, I’d love the historical backtest ‘uni’ P123 exposure list to play with.

Best,
Tom

Geov,

The specific index you are tracking is minimizing volatility at the total universe level (while maintaining the underlying index sector, industry, mkt cap and country weights (and a host of other replication issues) typically). What you are doing with just 12 stocks is VERY DIFFERENT, and is very unlikely to turn out a min. variance product. You are taking concentrated company, sector and country bets (very concentrated). You would likely need a very long backtest history to draw any inferences as to whether / how it works. Even 3-5 years of out of sample results likely won’t tell you very much.

But… I’d still love to play with the data.

Best,
Tom

Tom,
What you say about the MSCI methodology is absolutely correct. Also they re-optimize the index twice a year, So the selection parameters are not static.
Country is only USA and I don’t agree that the 12 stocks are concentrated bets. The buy rules prevent that from happening:

Sector Weight <30%,
and Industry Weight <20%,
and exclude some of the largest market cap stocks from being selected.

Also if you read my original article I intend to run 4 quarterly displaced models which hold their initial position for 1 year min. 2 of them are already up and the third will be posted on Jan-2. Thus there will eventually be about 30 different stocks in the holdings of the four models. Using four displaced models provides the ability to stage one’s investments over a year with trades occurring approximately every 3 months thereafter. Universe gets replaced every 3 months as well to stay current with USMV. So by July 2015 I will have about 20% of all the USMV holdings in my models. Even if my method only produces a few percentage points better returns than USMV over time, it will be worth the effort.

I will email the universe to you so you can play with it.
Best,
Georg

Tom I have just finished another model using the holdings of the $23-billion Vanguard Dividend Growth Fund VDIGX. There are only 50 stocks in the holdings and I use 10 of them in a Trader model, but this is not oos, just a backtest.
http://imarketsignals.com/2014/trading-the-dividend-growth-stocks-of-the-vanguard-dividend-growth-fund-simulated-performance-of-ims-best10vdigx/

I think there are many candidate ETFs and mutual funds where one can use this method.
Best,
Georg

Tom,

I had the same train of thought as you did.

I also don’t bother to manually build custom universes by stock lists.

And I also feel it’s too much work. I would prefer to know how what’s inside the box (how ETF screen for stocks) and would rather concentrate on improving my own skills in utilizing P123 (S&P) coding (e.g. for low volas: beta, earnings variance, defensive sectors, sound financials etc.).

But nonetheless it would be tempting to be able to pull stocks from an ETF as a universe to test initial ideas. For example, I’d like to see what a crossover of value and low volatility ETFs would look like. It would act as a shortcut to modelling your own custum universes (or absolute buy and sell rules).

Georg,

I agree, your models look great (as do the others you have on P123) and if you manage to stay ahead for just a few percentage points for the next few years you are beating a lot of professional fund managers.

We would have to reverse-engineer the universe to see what factors they use.

Their fact sheet states that the components primarily include consumer staples, healthcare and IT stocks. The current breakdown even reveals a financials bias. Many designers (not everyone and not in every model) restrict their universes right from the beginning and at least for the defensive-labeled models IT and financials are usually not among the low vola candidates. So I think that’s maybe one potential starting point to think about why USMV is including these industries.

On a side note - thanks for all the valuable discussion. I like how even though everyone has different views (while agreeing on other subjects) we usually have progress on the subject in question and we even have stats to ‘prove’ it (and will even have more of that in the future).

Best,
fips