AI in Finance: how to finally start to believe your backtests

marco · October 7, 2023, 4:33pm

Thanks for the links. Shows how easily one falls in the curve fitting trap. This quote is worth repeating “We rely on a single outcome of infinity from a tremendously complex system to test a trading algorithm, which is insane by itself”

We’ve seen much more complex strategies, with multiple stocks and many more factors, fail miserably out of sample. This is a BIG problem for us since it leads to user cancellations.

I also really like Matthew’s Correlation Coefficient or MCC, which the author seems to rely on heavily to evaluate the strategies. The confusion matrix (what a great name) is straightforward to compute when you have actual vs. predicted values. It’s also very intuitive with values from -1 to +1. We’ll definitely add it as one of the statistics in our upcoming AI/ML factors.

MCC for Ranking

It would be interesting to calculate the MCC coefficient for our ranking systems. We have very little right now to evaluate a ranking system. Mostly it’s just the annualized performance of the ranks grouped in buckets. Users simply alter weights to achieve a “better looking” bucketized performance (btw, we’re working on improving this right now, so this comes at the perfect time).

But how would we calculate the confusion matrix with ranks and no target to aim at? What is a True/False Positive/Negative for a rank? Perhaps we can calculate multiple MCC’s which can help two-fold:

Measure the accuracy of a rank
Determine the ideal rebalance frequency for a strategy

I’ll illustrate with an example where we calculate three MCC’s for 1w, 4w, 13w:

For every rank data point (a stock's rank on a particular date) we calculate the future 1w, 4w, 13w performance relative to the benchmark.
We then populate the 1w, 4w, 13w confusion matrices as follows
- Rank > 80
  - True Positive if stock outperforms
  - False Positive if stock underperforms
- Rank < 20
  - True Positive if stock underperforms
  - False Positive if stock outperforms
- Rank between 20 and 80
  - Throw away
- We calculate MCC's for 1w, 4w, 13w

With these three MCCs you could determine if a) the ranking system is accurate and b) for which time horizon. You would then set the portfolio strategy rebalance frequency to match the highest MCC.

Naturally this being financial data full of noise, the bar for a good MCC would be quite low. Perhaps even small values of 0.05 can create winning strategies.

And, since not everyone wants to short, another refinement is to calculate MCCs for long, short and long/short systems.

Thoughts?