NEW: Rolling Test Tool is now avaialble

Steve had said: [quote]
It is an OK feature. However, the rolling backtest as implemented, doesn’t truly reflect the function of the model, as the model shouldn’t be in a continuous startup phase. When the model starts, it has 100% cash and it will buy the total number of stock required to use up the cash. This causes the model to choose stocks much lower down in the ranking system than it normally would. Repeating the startup phase over and over again doesn’t capture the true function of the model.
[/quote]

I’d agree this is an issue for some systems. There should be an option added to skip the first n rebalance periods (or weeks, months, etc?). For example, if you want to run a series of 1 year sims and you think it takes 2 months to ‘settle in’ and get fully invested with high rank picks, this option would let you run the sims for 14 months each, but not include the first two months in the results.

Another enhancement I would like to see is a screen that shows the correlation between 2 sets of rolling tests that were run with the same settings (so number of samples and periods match). Of course, you can do this in Excel manually since the data is available.

I find this tool useful for running a large # of sims with different start dates, but I include the random<.75 buy rule in my sims so that they have to pick some stocks that would otherwise not have been included. It think this helps in finding systems that were over optimized.

Excess return is alpha. You can switch to the other tabs on the upper right corner of the histogram to display results for model performance or benchmark performance instead.

Wow, thank you Marco and Team!!!
This is very helpfull especially with sims that have a longer term rebalance like 3 Months
or that have a low turnover (below 500%)

In terms of 5 Stock high Turnover (over 1800%) it should add also a good tool to help
to judge a systems performance (of cause it is one piece, not tool will give you with a button
the 100% security “This system is not curve fitted”, but that can not be expected and that
was not your intention I believe).

Looking forward adding this to my free 2 3 Months R2Gs

https://www.portfolio123.com/app/r2g/summary/1062281
https://www.portfolio123.com/app/r2g/summary/1275153

and to the others, esp. https://www.portfolio123.com/app/r2g/summary/1290029 since it has a relative low turnover.

Regards

Andreas

I receive the error below when trying to run a rolling test:

“Rolling server is offline until Monday”

Rolling test tool is back online.

Great addition. First impressions are that the new servers are FAST.

I like that hedging can be disabled. I do think this can help determine if curve fitting or luck is an issue, particularly for models with very low turnover.

To evaluate R2Gs, I would probably use this tool and disable hedging, though it’s not possible to completely disable market timing when contained in the buy/sell rules or ranking system.

Overall, it’s not a silver bullet to flag overfitting, but a good tool that I definitely will use, especially for my own models.

The initial picks of the model obviously influence the results of a simulation.
Such effects can be studied and mitigated by simply increasing the simulation period.
I ran ‘Value Sentimentum - N/A neg Impact’ with the same number of samples and rolling offset,
varying only the period length (3y and 7y): the difference is significant.



Marco,

Thanks for the new tool. Is there a reason I can’t run this on ETF sim’s? I think this will definitely be helpful (and save a lot of time) for developers building their own systems. It will likely do little in predicting out of sample performance or flagging ‘overoptimization’ for really hi-turn R2G’s, sadly… as the most highly optimized systems will still likely pass all these tests - and they may just give false confidence. But, it can likely help developers understand system performance much more rapidly. So, thanks.

EDIT: Can you add some summary stats on these? So - Average, median and ST. Dev. Could you also give us performance in ‘up markets’, ‘downmarkets’ and ‘flat markets’ as averages and ranges?

Best,
Tom

Hedging trumps all stats, so it needs to be turned off (we’ll have a way to identify timing rules in buy/sell rules). The R2G redesign we have in mind is to show:

  • out of sample in the first page, plus other stats like sector allocation
  • rolling stats in page 2 similar to the output of the new tool
  • designer backtest results in page 3

At this point it is not clear if the rolling tests reveal curve-fitting as we had hoped. It may actually be creating the oppostie effect: give the impression that the FUTURE outcome can be estimated with a statistical probability.

However, rolling tests combined with the out-of-sample results, might be much more valuable that each of them individually.

“rolling tests combined with the out-of-sample results, might be much more valuable that each of them individually.”

Can you expand on this thought Marco?

Also, what happened to the noise test? That was the one I thought was pretty useful.

Steve

Hello,
For the rolling tests, I wonder if someone can help me to know exactly what these parameters are doing:
Number of Samples (2-100)
Rolling offset in weeks (4 - 52)
Period Length in years (0.25 - 3.0)

Canadian VS US market !!!
I also would like to have your opinions about strategy validation.
I have backtested and validated my strategies one different periods, time frames, with EvenID and they work great and they are robust, but only with Canadian stocks.
All my strategies are designed for the Canadian market, but they only work with that market.

In your opinion, is is normal that my strategies work well ONLY on Canadian market (35% annualized approx.) but they are completely useless on USA market (0-5% annualized) ?

I’ll work on some documentation. For the time being, though, consider the rolling test a jazzed up version of the feature we;ve long had in screener that used to be called “Advanced Backtest” and is now called “Rolling Backtest.” Its described in the Help section, more particularly, the PDF entitled “Introduction to Screen Backtest” and starting on slide 13.

It’s entirely possible that your conclusion is valid. Investors in different markets react different ways to the same information. For example years ago when I was at Value Line, they tied to adapt the timelines ranking system to Japanese stocks. It turned out that over there, PYQ-type earnings momentum provoked the opposite sort of reaction from investors. In the US, many, as a matter of gut reaction tended to go with the flow, making momentum a bullish thing. But in Japan, the gut reaction was contrarian in nature, making earnings momentum a bearish thing. Oh well . . . There are also different degrees to which fundamental data is available and accessible, making for different levels of efficiency, And finally, Canada is an energy-heavy market, so you may want to see how your Canadian performance compares to what you might get from a custom US. universe that is energy heavy (particularly small senergy heavy) in a way that mimics Canada.

still getting “No server available for request. Please try again later”
We may need server status information some where.

gs3

It crashed again. At least it crashed at a different place so its progress. Sorry about that. A lot of code was added to run many sims in parallel.

Its loading now.

Thank for your opinion and explanation.
My concern was about building strategies that are “curve overfitted” on a “market”… I was wondering if it could be a problem.
I try to build strategy that could work on Canadian and US market, so it could prove their validity, but it seem to be difficult.

Hi Marco et al,
Great addition thanks for that. Some suggestions below:

  1. Is there a chance we could get more simulations info (input and output parameters) in the selection panel? This would help quickly visualize/sort the rolling test simulations.
  2. Also, agree with previous post re. need for an exceedance plot with percentile outputs (perhaps on the second axis of the histogram). This would be a handy output for comparing simulations.
  3. Taking it event further, an option to fit an extreme value distribution function (e.g. weibull, gumbell, etc.) to the left tail of the return distribution would be great to characterize the likelihood of “extreme” events.

Keep up the good work, Cheers
Frederic

Is the “Rolling Test Tool” also coming for R2Gs any time soon???
Thanks.

Yes, it’s a significant component of the R2G redesign currently underway.

The start date seems to be fixed on 4/6/2002. I am a “Screener” subscriber with only 5 years of Sim backtest available. Can you change the rolling test tool to accommodate the shorter period?

Right now the start date is computed based on the number of samples, period length, rolling offset, and end date.
The interface will be updated soon to improve ease of use.
Until then, you’ll have to adjust the settings to meet the the start date requirement.