I've been trying to figure out what advantages the various machine-learning algorithms have over ranking, and I have been able to come up with very few. But I'm sure many users have come up with more, so I'd be grateful for your lists.
Here is a short list of advantages of ranking systems over machine-learning algorithms:
- They're easier to understand.
- It's easier to figure out why a particular stock was selected.
- Backtesting to optimize ranking systems can be done on simulations; machine-learning models are optimized using the equivalents of rolling screens and rank performance.
- You can use as many features as you want in ranking systems, while most machine-learning algorithms work better if you winnow down your features to fewer than 150 or 200.
- N/As are handled by ranking systems a little better than they're handled by machine-learning algorithms, at least from what I understand from forum posts.
- In ranking systems, you can compare factors with other companies in the same industry or sector; in machine-learning algorithms that takes quite a bit of extra work.
- Ranking systems can use conditional factors; machine-learning algorithms typically do not (though with a bit of work you can force them to).
- Ranking systems can be optimized for going long or going short or both; machine-learning algorithms typically are optimized for both only rather than one or the other. So while a ranking system for going long will typically use a very different set of factors than one for going short, machine-learning algorithms will mostly use the same features across the board.
- There is only one algorithm for ranking systems. There are many very different algorithms for machine-learning, making their use more conceptually difficult.
- In optimizing ranking systems, slippage can play an important role: mean-reversion factors, for example, increase transaction costs in simulations, and therefore may be given low weights. In optimizing machine-learning algorithms, slippage disappears, and short-term factors can play a much larger role than may be practicable when trading.
- If your goal is to consider every stock from as many angles as possible, ranking systems are a very direct and natural way to achieve that goal. Machine-learning algorithms may, on the other hand, consider only a few angles that are most pertinent to the target.
- If one believes, as I do, that factors cycle in and out of favor in a completely unpredictable manner, then backtesting over periods shorter than twelve to twenty years makes no sense. Instead, one should look at the maximum amount of time available (a hold-out period can be assigned either prior to or subsequent to the backtest period). Machine-learning algorithms are predicated on the assumption that factors will be more or less pertinent in different periods and that this is ultimately somewhat predictable. For example, most of the core factors worked far better during the 2002-2006 period than in any other period, which was a golden age for small-cap value. Does that make that period an outlier? Not if one believes that, due to the unpredictability of factor prevalence, such a period might return after the next crash. Ranking systems, by including that period along with many others, will end up favoring factors that work across many different periods; machine learning models have a built-in recency bias, especially if k-fold validation is used in training.
- Choosing the right machine-learning algorithm to use often involves comparing various systems and parameters using backtested results, which strikes me as rather prone to overfitting.
- In machine-learning models, the number of parameters to choose between can be overwhelming.
- In ranking systems, size factors can play a very important role, allowing one to test over a broad universe ranging from microcaps to large caps. In machine-learning algorithms, size factors are more or less ignored (at least in my limited experience), since they increase the returns of high-performing stocks while decreasing the returns of low-performing stocks. It therefore behooves one to limit universes to a certain size when building machine-learning algorithms, which ends up complicating results, since high-performing stocks will disappear from the universe.
I'm sure there are many advantages of machine-learning models over ranking systems, and I will leave those for others to enumerate.
