Ranking your ranking systems

I wanted to follow up on some of the procedures that I described above. I have ran my described procedure on variations of the the Easy to Trade US and Easy to Trade Europe universes.

Apart from my own factors and some I found on the forum earlier, I used the top factors of this list as well to test the procedure, so thank you for that.

I ended up using a metric based on your ultimate omega ratio as well as your outlier trimmed alpha measure for both single factors as well as multi-factor ranking systems, which I calculated for each of the equity curves. The metric I used for single factor rankings was

 min(slope,2) +  min(spearman_corr,1) + min((1-max_drawdawn),1) + min(top_bucket/(benchmark*2),1),

with a maximum score of 5. I used the same for the multifactor rankings but multiplying the benchmarks return by 5 instead of 2 for the last term. Taking the two rankings based on the omega ratio and outlier trimmed alpha using the above measure and summing these rankings gave a good idea about the consistency of the performance of the single and multi-factor ranking systems.

Using the clusterings algorithm I linked earlier I get 6 clusters for both US and European data.
Cluster 1 roughly consists of Growth and Quality factors
Cluster 2 is rather small and consists of Growth, Quality and Valuation factors
Cluster 3 roughly consits of Growth, Quality and Value and Stability Factors
Cluster 4 roughly consists of Growth, Quality, Valuation, Stability and Sentiment Factors
Cluster 5 consists of Size, Stability and Quality Factors
Cluster 6 consists of Momentum and Sentiment Factors

Needless to say, I did not find the intuitive results I had hoped for using this clustering method.

Moreover, following the rest of the procedure I described above I found another shortcoming. Because one of the clusters is so small (Cluster 2), the amount of factors to be picked from each cluster needs to remain small or Cluster 2 will be deluted in some way or another.

Hence, I decided to change this step for now to (4) determine the highest correlation of each of the ranked factors with the factors that are ranked above and pick the top 100 factors that have a correlation less than 0.85 with factors ranked above that factor.

Following that procedure I obtained an equally weighted 100-factor ranking system that outperformed the benchmark over the past 20 years by about 35% before slippage. Which I thought was not bad. The results held up across different universes.

Most interesting was this step. Adding the factors one by one, I found that for this specific multi-factor ranking system Return on Capital factors were pretty much always added. Also, Accruel and Size factors were also almost always added to improve returns. This leads me to believe my ranking systems should give more than equal weighting to Return on Capital factors and should always consist of one or more augmenting factors. So these results were quite intuitive in my opnion.

I decided to add another step (6) to remove factors one by one after adding all the ones from step (5). Factors that were removed were mainly direct factors (non size and accruels) that were at least somewhat related to other direct factors (For example one of the sales growth factors was removed, but the gross profit growth factor of the same form that ranked higher initiually was kept. I thought the results of this step were also quite intuitive.

I ended up with 180 factors that combined resulted in an annual alpha of about 40% above the benchmark. For European stocks the results were even better (+50%), also across different randomized universes. The results were also pretty consistent year by year, with hardly any negative years besides during the GFC.

I took this post to heart in constructing the metric I described above. I do want to mention though, that I think that even though for multi-factor ranking systems you only utilise the top bucket in your simulations, it is still very valuable for a model to explain the full cross-section of stock returns, just to be more confident about your out-of-sample results. I don’t think I would ever chose a multifactor ranking system based on the performance of the top bucket alone (if we can speak of such a thing anyhow, with the variety of buckets that can be applied).

I might share the python code of the above procedure at some point in the future once I clean it up a bit.

Best,

Victor

1 Like