Do you have any suggestions for ETF rotation models?

Hi Jim,
You can have a look at my latest ETF model on Seeking Alpha which uses six of my market timers to produce risk-on, risk-off, and risk-neutral signals.
It shows a 34% annualized return. To the best of my knowledge nobody else has come up with a market neutral signal in the context of market timing.
1,260 people have looked at this, which is a good advertisement for P123.

https://seekingalpha.com/article/4481459-im-multi-model-market-timer-not-your-daddys-old-moving-average-crossover-system

Looking forward to your negative comments.

Georg,

I get that you do not understand what cherry-picked means, or you do and you think the people you are selling your models to do not understand the term.

Either way not really something you should be bragging about exactly.

Best,

Jim

The reason that a portfolio of XLU, XLV and XLP provides a marginally higher return than SPY is that XLV alone outperforms SPY by a lot. This has nothing to do with volatility-drag. See the two figures below.

Jim, it would have been prudent to do this simple backtest before lecturing us about volatility drag. Others have also found that there is no rational to this. See “The Myth of Volatility Drag”

the author (also an engineer) concludes “Let’s banish “volatility drag” from our vocabularies!”



Georg,

Why does a portfolio consisting of XLV, XLP and XLV make you feel so insecure? You get that this was not a portfolio that I was recommending or using don’t you?

Seriously, is everything okay?

Best,

Jim

Jim, you don’t have to be rude.

So what if the AVERAGE RETURN for SPY is higher while the CAGR is lower than that of the Combo XLU+XLV+XLP.
Why don’t you design a model which makes use of this fact and post it so that we can all benefit from your wisdom.

I note that you have not posted any Designer Models presumably because you don’t want to be criticized if your model does not perform well, but do want to have the prerogative to criticize others.

Georg,

It is you who are rude.

You started it but this quote is the least of it. I find your selling of cherry-picked and overfitted models unethical. I am most offended by your turing this thread into your personal sales pitch.

[b]Marc would have had none of it. He asked you to stop (as he should have).

As I recall he threatened to charge you as a pro if you did not stop. He took offense also. You stopped while he was here. Please expand on that conversation if I got anything wrong.[/b]

Unlike your models Marc’s median model does not dramatically underperform the benchmark which had a lot to do with his view on unethical selling of useless models, I think. Marc never charged for his models.

Jim

For most of the portfoliodb.co models you can create them yourself using the software quantrader by logica-linvest.com The rules are not the same but the methodology and results are very similar. You can also test out any combination of stocks or ETF. Would love that functionality on P123.

Cheers,
MV

Georg I’ve now seen many iterations of your seasonal ETF switching model and it seems every time it’s a different set of ETFs. Jrinne came on quite strongly but he has a good point about curve-fitting. Why do the ETF options in the baskets change so frequently?

Please list the the different sets that you have seen me use.

Jim,
You have no response to my challenge to present a strategy that the uses volatility-drag to somehow outperform SPY.
All you can do is to pivot into personally attacking me.

Other than the one you already posted here, these are the other 2 iterations you’ve plugged:

Ticker(“XLY XLI XLB XLv”) vs
Ticker(“XLP XLK XLU QQQ”)

Ticker(“XLY XLI XLB XLK VBR”) vs
Ticker(“XLP XLV XLU VIG IEI”) )

Note the XLK being swapped and playing a defensive role in one iteration and an aggresive role in the other.

Thank you all for your contributions, albeit I see the argument has gotten somewhat out of hand.

Thank you for linking to the models, Georg, but I, too, am afraid of overfitting, especially in models where I don’t know all of the criteria. Having said that, the seasonal effect (Halloween) on sectors is a well-documented phenomena. So, while I agree that it is not overfitting, you also have some other etf that is unknown to use in a seasonal cycle.

By the way, I’m extremely impressed with your timing model results, and I’ve attempted to understand it. I’m still not sure if it will work out of the backtestperiod. When was this model created?

Jrinne, I agree that a backtest does not always offer much, but it can provide some hints on models that might work. Do you have a link to someone you believe can work and who can be tested on P123?

mv388158, Yes, I attempted to create some of the models in p123. I don’t always get the same results, but I keep on trying.

I don’t want to trade frequently; once a month is about right, and Im not looking for some extreme returns. Its even enough to have the same as the market but but less drawdown. However, I want to use p123 for an asset class rotation portfolio with a specific hedge function for my stock portfolio. So I’m interested in all of the systems offered on p123 that anyone can recommend. Then its possible to altso test them.

For the time being, I’m looking at these models:

https://allocatesmartly.com/livingstons-muscular-portfolios/
https://allocatesmartly.com/financial-mentors-optimum3-strategy/
https://allocatesmartly.com/taa-strategy-accelerating-dual-momentum/
https://www.cxoadvisory.com/momentum-strategy/

A simple take (start) on theese models:

Papabear:
Buy the top tree
Ticker("VWO VNQ EFA VTV VUG IJT DBC IAU TLT ") // papa
ShowVar(@PAPASCORE,ROC(63)+ROC(126)+ROC(252))

CXO Momentum:
Buy the top tree
Ticker(“SHY, TLT, vglt,VNQ, IWM, SPY, GLD, EFA, EEM, DBC voo”) // CXO
ShowVar(@CXO,Roc(84))

Dual Momentum:
Buy the one with highest momentum
Ticker(“scz,voo,sptl,tip”)
ShowVar(@DM,ROC(21)+ROC(63)+ROC(126))
In this model VOO and SCZ has to bee in positive terrain to be bought. If not buy TIP or SPLT with the highest one month momentum

Optimum 3:
Buy 3 of the top 6 based on momentum, but choose the tree that is least correlated
Ticker(“SPY, QQQ, VNQ, REM, IEF, TLT, TIP, VGK, EWJ, SCZ, EEM, RWX, GLD, DBC, BWX”) // O3
My take on the momentum is the same as PAPABear:
ShowVar(@O3,ROC(63)+ROC(126)+ROC(252))
I have no idea how to program the pick of the least correlated tree og the top 6.

CXO: https://www.portfoliovisualizer.com/test-market-timing-model?s=y&coreSatellite=false&timingModel=4&timePeriod=4&startYear=1985&firstMonth=1&endYear=2021&lastMonth=12&calendarAligned=true&includeYTD=false&initialAmount=10000&periodicAdjustment=0&adjustmentAmount=0&inflationAdjusted=true&adjustmentPercentage=0.0&adjustmentFrequency=4&symbols=DBC+EEM+EFA+GLD+IWM+SPY+TLT+VNQ+BIL&singleAbsoluteMomentum=false&volatilityTarget=9.0&downsideVolatility=false&outOfMarketStartMonth=5&outOfMarketEndMonth=10&outOfMarketAssetType=1&movingAverageSignal=1&movingAverageType=1&multipleTimingPeriods=false&periodWeighting=2&windowSize=4&windowSizeInDays=105&movingAverageType2=1&windowSize2=10&windowSizeInDays2=105&excludePreviousMonth=false&normalizeReturns=false&volatilityWindowSize=0&volatilityWindowSizeInDays=0&assetsToHold=3&allocationWeights=1&riskControlType=0&riskWindowSize=10&riskWindowSizeInDays=0&rebalancePeriod=1&separateSignalAsset=false&tradeExecution=0&comparedAllocation=-1&benchmark=VFINX&timingPeriods[0]=5&timingUnits[0]=2&timingWeights[0]=100&timingUnits[1]=2&timingWeights[1]=0&timingUnits[2]=2&timingWeights[2]=0&timingUnits[3]=2&timingWeights[3]=0&timingUnits[4]=2&timingWeights[4]=0&volatilityPeriodUnit=1&volatilityPeriodWeight=0

PAPA: PAPABear

DUAL: DUAL

Georg,

I was asleep and have not read all of your posts. Not sure that I will. My apologies if I am not responsive to a good question.

This is not a competition.

It is possible that you might be able to recall that Marchus made the point that strategies with more ETFs do better.

I have agreed with this for a while. Going fully into SPY then to TLT then back fully to SPY simply does not work out-of-sample. Nor do other strategies like that (with too-few ETFs) work out-of-sample.

I simply posted that I agreed and gave 2 reasons why that is true: diversification and reduced volatility-drag which are not really separate reasons. You have better reasons I guess?

XLP, XLV, and XLU just took me literally 15 seconds to find as an example of volatility-drag. This actually occurs pretty commonly even within individual sector ETFs and elsewhere. I stated at the time this is not anything I would use as a strategy to invest in. I specifically used this example because it has nothing to do with any of my models and I assumed people would be able to recognize that.

I also said this at the time:

I am not going to start a backtest competition now because you double-dared me. That would be less than meaningless as Marc has pointed out. I think that it is actually unethical–as did Marc–if you are trying to sell a backtested strategy here on P123. Back in the day P123 would have asked you to stop.

Ethical issues aside, one just needs to go to your designer models to see the problem with overfitted backtests. Debating it in this thread will do nothing to change what anyone can find there on their own. What other evidence could one possibly need?

I guess we could pretend that this time is different. Please be my guest everyone. It is fun pretending that you would have known what strategy would have worked best years ago and imagine how rich you would be now.

The real question is why did putting together XLP, XLV and XLU (fixed in equal amounts) after 15 seconds testing–as an example showing that volatility-drag can affect returns–set you off?

Are you okay?

I might see if there are any questions that address the problems that rotation strategies with too-few holdings have later. My appologies if anyone asked a pertinent question of me that I did not answer.

Jim

With seasonal timing, slippage of 0.1%, and selecting 2 ETFs with P123 ranking system “ETF Rotation - Basic” and backtest period from mid April 1999:
The first set shows an annualized return of 12.2% with a max D/D= -40%.
The second set shows an annualized return of 13.9% with a max D/D= -35%.
The original set as posted earlier in this thread shows an annualized return of 14.9% with a max D/D= -35%.
Over the same period SPY produced an annualized return of 7.6% with a max D/D= -55%.

What is there not to be liked of the Seasonal Timing strategy, AND WHERE IS THE CHERRY PICKING?
So this is my reply to the question of the thread, and I trust that some members may find this of interest.

Hey there. It’s pretty easy to program this stuff into Portfolio123. Although…without a top level membership I don’t think you’re going to be able to easily select the maximum diverse port or weight the ETFs in a risk parity weighting. You might try Portfoliovisualizer. I pay them also and follow a Accel Dual Momo strategy with their software. It also eats mutual fund tickers, so you can backtest into the 1980’s with VFINX and FOSFX. I replace VINEX with FISMX since I run this system in Fidelity. You can also program the GTAA3 from Meb Faber very easily with Portfolio123.

The term “cherry picking” refers to selection bias. When pharmaceutical firms only publish the results of successful trials and not unsuccessful ones, that’s selection bias, or cherry picking.

In this case, if you choose a handful of ETFs with full knowledge of their performance in certain months and another handful of ETFs with full knowledge of their performance in other months, and then publish that performance as if it could have been foreseen from the outset (1999), you’re clearly cherry picking. On the other hand, if you chose those two handfuls of ETFs based on a neutral rule implemented at the very beginning of your study with no forward testing and then showed that they outperformed, then that wouldn’t be cherry picking.

Thank you, scrichley. I use that system as well, but have you looked at any others that would be of interest?

In addition to TLT, I have included an extra ETF to the system, TIP. Since there is now plenty to imply that there will be high inflation for a long period.

And yes, I do not have access to the backtest simulator. The biggest problem with the screen back test is the forced monthly rebalancing, but beyond that the strategies are so simple that they should be easy to use in p123.

Yuval,

Thank you. Exactly right. You give a medical example.

In medicine the people who do this are often called “optimistic” doctors giving them the benefit of the doubt. That is really what we call them as we try to understand how they could believe in what they are doing.

When you see it in real life it is more like someone tries a drug for a disease on 100 people and all but 3 die. The drug does not seem to work so well. Maybe it made things worse. It is not like this drug will ever get FDA approval but you can give it some continued off-label life.

What do you do? You publish a case report about about how well the 3 people who recovered are doing. We see this type of thing with some drugs that don’t actually have any effect on Covid. That is cherry-picking.

But it is not called cherry-picking in some circles. Rather it is called standard operating procedure or “how to earn a buck.” If you are not in the inner circle there can be a debate as to whether it is purposeful or not.

RK, a terrible procedure, had a bunch of “optimistic” doctors promoting it for a very long time. RK was done with a scalpel and surgeons successfully avoided FDA oversight because a scalpel is already an FDA approved device. Or more accurately it has never gone through the FDA approval process but no one is going to pull scalpels off the market.

I worked with (for) Dr. Waring at Emory University who was a very optimistic and upbeat guy who did a ton of RK surgeries and presented his series and case reports in the literature and when he was invited to lecture. And did their corneal transplants when the procedure failed. He was seen doing RK surgery on CNN and he married the local (young) TV weather forecaster. He is also the author who ended RK surgery with a study showing the negative side effects. But not before he was up and running with his LASIK procedures.

WAIT! Uh…when did he know? When DID he know about RK surgery? He did know; he was the lead author on the study that ended RK…

The point is too many people went blind or generally had an unacceptable level of complications before the definitive study on RK was published and the cherry-picking ended. RK was never a good procedure and those debating how someone could be so “optimistic” already knew not to have RK surgery.

Now with regard to whether one should select an “optimistic” investment advisor with some impressive “case studies”…

Jim

1 Like

Yuval, your comment shows that you have not looked at the seasonal effect in equities.

The seasonality of the S&P 500 is easily verified by backtesting with historic data. The S&P 500 with dividends from 1960 onward returned on average 1.92% for the yearly six-month periods May through October, the “bad-period”. For the other six months, the yearly “good-period”, from November through April, the average return was 8.47%.

In evidence-based medicine, likelihood ratios are used to assess the reliability of a diagnostic test. In finance, likelihood ratios can quantify the reliability of a financial test as well. For example, one can determine the probability of equities performing better over a particular period in the year depending on the outcome of a relevant diagnostic test.

The test period, from January 1960 to April 2019, held 59 cyclical good-periods and 59 cyclical bad-periods for stocks, totaling 118 six-month periods, and showed an average return of 5.20% for all periods.

The positive likelihood ratio is 1.86 with a 95% confidence interval of 1.26 to 2.74; a value greater than 1 produces a post-test probability which is higher than the pre-test probability (pre-test probability is that there is no difference in performance between the two 6-mo periods).

The diagnostic test provides a 65% probability for the S&P 500 to perform better than average from November to April, and a similar probability to perform worse than average from May to October each year, indicating causation, namely that stock market returns increase or decrease due to seasonal effects.

So you want to invest in defensive type of ETFs from May to October, and more aggressive types from November to April. This is not cherry picking because the chosen investment type is based on a rational argument.

The performance of the same group of ETFs from May to Oct is shown in the second chart below. Over these 6-mo periods it produced a negative return. So it would be foolish not to switch to a more defensive asset allocation during those periods.

You can read more about this here:



Hi Marchus,

The allocate smartly website doesn’t distinguish out of sample from back tested performance which makes it hard to determine how well these ideas work. These strategies will have worse out of sample performance compared to in sample performance due to back test over fitting and/or to the market changing. Consider using a few of these strategies on a portion of your portfolio to minimize timing errors. Jim’s suggestion of holding more ETFs will also minimize an allocation error. There used to be web sites that tracked out of sample performance of tactical allocation strategies but I couldn’t find any with a google search. Can you find any?

Scott