What I am interested in is finding the "average performance" from the rolling test with Dataminer

Hi, I have conducted performance tests using Danp’s spreadsheet in Coolab, (Explanation for List of factors.xlsx spreadsheet and the ranking of rank factors - #8 by AndorraInvestor) which gives you a rank for each bucket in the “rank performance” test. Great tool.

However, I was wondering if it is possible to run a “rolling test” on each of the 1000 nodes that I have used in Danp’s system using Dataminer? What I am interested in is finding the “average performance” from the rolling test of each of the 1000 nodes here: 110323 Mine 1000 Noder P123 - Google Sheets

edit: I have not used rolling-tests in a while. There are a couple ways it can be implemented. My comments may not have been correct for longer rebalance periods. I tend to rebalance weekly.

As Dan suggests below.

Thank you Dan.

1 Like

DataMiner doesn’t have any function for running rolling ranking performance tests. Also, the DM output is not like the rank performance output from the site. DataMiner is actually running a screen backtest for each bucket so that it can return a lot of stats.

The only reason rolling rank performance tests would be useful would be if the rebalance period used in the rank performance test was fairly long. I have been thinking about doing this myself. In the past, I used a short 2 week rebalance period for the rank perf tests in the script you mentioned. The reason was to get a lot of datapoints. But I usually hold stocks about 3 months on average, so it would be more realistic to use a 3 month rebalance in the rank performance tests. I could run the script with 3 times with different start dates that are a month apart. And then calculate the average of the returns for each bucket for each factor and paste those averages into the Results sheet (the one that has the graphs).

Thank you for the feedback! I have never used DM before, but I saw this:

That was the reason I thought it was possible to use rolling tests for each factor node in DM.

The RollingScreen in DataMiner will run rolling screen backtests. You mentioned that you wanted to get the “average performance from the rolling test of each of the 1000 nodes". So you could run those screens using a quick rank parameter and change it for each test to cover those 1000 nodes (aka formulas). But what is “average performance”? You could set the Max Num Holdings parameter so it returns the screen backtest results for the top n stocks. Or use an FRank() rule in the screen to return backtest results for the top x% of stocks. But is that what you want?

This is an example of the output you get:

Name             Start       End            Periods   Avg#Pos   AvgRet%  AvgBench%  AvgExcess%  Min%NoSlip  Max%NoSlip  AvgStdDev  Last13AvgRet%  Last65AvgRet%  GeoMeanRet%  GeoMeanBench%  Last13GeoMeanRet%  Last65GeoMeanRet%  Last65Avg#Pos
CF12/EV top 50   2005-04-03  2020-04-03         196     50.00     10.45      10.73       -0.28      -99.99      930.70      47.70          -0.92           5.57         7.26           9.36              -6.35               3.19          50.00
CF12/EV top 25%  2005-04-03  2020-04-03         196    989.99      9.76      10.73       -0.96      -99.99     2359.45      43.52           3.84           5.67         6.96           9.36              -1.75               3.47    

These were the rules to get that output (I omitted the part of the script with universe, dates, etc):

Iterations:
    -   Name: CF12/EV top 50
        Ranking:
            Formula: (CurFYEPSMean*SharesFDA + DepAmortA)/EV
            Lower is Better: false
        Max Num Holdings: 50
    -   Name: CF12/EV top 25%
        Rules:
            - FRank("(CurFYEPSMean*SharesFDA + DepAmortA)/EV",#all,#desc) > 75

Notice that the script would need to have a section like those above for each of the 1000 formulas you want to test. If you plan to do this, let me know and I can show you how to do that in Excel in a way that only takes 10 minutes as long as you are doing exactly the same test for each factor.

Thank you for your feedback.

Yes, you are probably right. However, I am looking for different methods to test the system and individual nodes.

The advantage of a rolling test is that it eliminates 1. timing luck, 2. overfitting, where one or two or ten stocks can greatly affect the total return, 3. and a backtest in simulation is very sensitive to small changes in my sell rule rank < 99. 4. When a node on average in several hundred similar 25-stock portfolios lifts the return in average % over a long period of time, it may gives a good picture of the effectiveness of the node.

But I have to be honest here, I am searching a bit blindly and have read a lot on the forum to see what could be good strategies for optimization, such as here, where there is a small discussion on how to optimize: alternatives to optimization?

One method mentioned by several people in the forum is bootstrapping. One could spend some time comparing and contrasting bootstrapping to rolling tests. They have some similarities (and differences) in my mind.

At P123—without the API or DataMiner—people use a similar method called subsampling in the literature. I don’t think it has a formal name at P123. Subsampleing has been found to be very similar to bootstrapping IF THE SUBSAMPLE IS NOT TOO SMALL. Most of the time you want the subsample be be at least 50% of the universe (according to any peer-reviews literature or implementations I have seen in Python anyway). This takes the form of using MOD() at P123 for the most part. It has been advocated by many people in the forum (including P123 staff I believe).

Hmmm…a Mod() subsample is usually less than 50% of the universe. Does that matter? Yes, No question about it.

A simple random seed for random() would literally add nearly infinite additional subsamples (with greater than or equal to 50% of the universe if desired) in addition the great idea of using MOD() universes. Mod() is a great idea, if EXTREMELY limited.

Also, real bootstrapping (not just subsampling) can be done with Python—without bothering P123 with a feature request (with or without a random seed BTW).

Yuval frequently advocates Bayesian methods. Putting it simply, I agree that this can be helpful sometimes.

Specifically Yuval has referenced this paper frequently: Is There a Replication Crisis in Finance?.

In this thread for example: 0 Vote Why such a large discrepancy between backtesting and real-life results?

Same question keeps coming up. Similar answers each time as there a finite number of reasonalbe answers.

Pretty much all of them available in Python. Most of those Python programs have a random seed option. Random seed would go a long way toward making some of those algorithms doable in a spreadsheet (or within the P123 platform) if that is desirable at all.

BTW, while I think random seed might give reasonable “bang for the buck” for anyone (requiring Python now), I love P123 just the way it is. I am happy with my ports just the way they are, they seem to be making me money based on my present out-of-sample data and P123 does many incredibly cool things!!! I.e., not a feature requests. A suggestion? Yeah, why not?