Ranking vs machine-learning algorithms

I've been trying to figure out what advantages the various machine-learning algorithms have over ranking, and I have been able to come up with very few. But I'm sure many users have come up with more, so I'd be grateful for your lists.

Here is a short list of advantages of ranking systems over machine-learning algorithms:

  • They're easier to understand.
  • It's easier to figure out why a particular stock was selected.
  • Backtesting to optimize ranking systems can be done on simulations; machine-learning models are optimized using the equivalents of rolling screens and rank performance.
  • You can use as many features as you want in ranking systems, while most machine-learning algorithms work better if you winnow down your features to fewer than 150 or 200.
  • N/As are handled by ranking systems a little better than they're handled by machine-learning algorithms, at least from what I understand from forum posts.
  • In ranking systems, you can compare factors with other companies in the same industry or sector; in machine-learning algorithms that takes quite a bit of extra work.
  • Ranking systems can use conditional factors; machine-learning algorithms typically do not (though with a bit of work you can force them to).
  • Ranking systems can be optimized for going long or going short or both; machine-learning algorithms typically are optimized for both only rather than one or the other. So while a ranking system for going long will typically use a very different set of factors than one for going short, machine-learning algorithms will mostly use the same features across the board.
  • There is only one algorithm for ranking systems. There are many very different algorithms for machine-learning, making their use more conceptually difficult.
  • In optimizing ranking systems, slippage can play an important role: mean-reversion factors, for example, increase transaction costs in simulations, and therefore may be given low weights. In optimizing machine-learning algorithms, slippage disappears, and short-term factors can play a much larger role than may be practicable when trading.
  • If your goal is to consider every stock from as many angles as possible, ranking systems are a very direct and natural way to achieve that goal. Machine-learning algorithms may, on the other hand, consider only a few angles that are most pertinent to the target.
  • If one believes, as I do, that factors cycle in and out of favor in a completely unpredictable manner, then backtesting over periods shorter than twelve to twenty years makes no sense. Instead, one should look at the maximum amount of time available (a hold-out period can be assigned either prior to or subsequent to the backtest period). Machine-learning algorithms are predicated on the assumption that factors will be more or less pertinent in different periods and that this is ultimately somewhat predictable. For example, most of the core factors worked far better during the 2002-2006 period than in any other period, which was a golden age for small-cap value. Does that make that period an outlier? Not if one believes that, due to the unpredictability of factor prevalence, such a period might return after the next crash. Ranking systems, by including that period along with many others, will end up favoring factors that work across many different periods; machine learning models have a built-in recency bias, especially if k-fold validation is used in training.
  • Choosing the right machine-learning algorithm to use often involves comparing various systems and parameters using backtested results, which strikes me as rather prone to overfitting.
  • In machine-learning models, the number of parameters to choose between can be overwhelming.
  • In ranking systems, size factors can play a very important role, allowing one to test over a broad universe ranging from microcaps to large caps. In machine-learning algorithms, size factors are more or less ignored (at least in my limited experience), since they increase the returns of high-performing stocks while decreasing the returns of low-performing stocks. It therefore behooves one to limit universes to a certain size when building machine-learning algorithms, which ends up complicating results, since high-performing stocks will disappear from the universe.

I'm sure there are many advantages of machine-learning models over ranking systems, and I will leave those for others to enumerate.

4 Likes

I agree with all of your points. But let me make a start of taking the other side of this anyway.

  • Machine learning algorithms have the potential to offer a quick workflow to answer the question: “does this factor improve my systems compared to the set of factors I already have?”, while with ranking systems that takes a lot of extra work and thought outside of the ranking system environment (downloading factors, running simulations, running rolling backtests). Such a workflow is not formalized within the ranking space as of yet.
  • Machine learning algorithms have the potential to determine the factor weights for you in a quantitative way, while with ranking it is more subjective and left open to the user: equal weight, quantitatively optimised, economic intuition, etc. and that again requires a lot of work outside of the ranking enviroment. Such a way of determining weights in an organized matter is not formalized within the ranking space as of yet.
  • Machine learning setups set out to explain the whole cross-section of stock returns, which means they can potentially encompass the ranking approach as a monotonic special case—and then extend it with interactions, non-linearities, regime dependence, and portfolio-aware objectives. Incorporating and experimenting with these parts of factor investing can be done with ranking, but the current environment doesn’t make this an obvious process.
  • Machine learning effectively set a specific target, while ranking systems on their own can not. This target can (again: potentially) be relative 3m return to the benchmark with minimal slippage. The current ranking enviroment doesn’t allow you an easy workflow that takes into account slippage costs. This can only be done effectively by dynamic weighting via a simulation or artificially incorporating slippage limiting factors in your ranking system - a machine learning environment could potentially offer a complete workflow for optimising net returns.

All in all: the idea of having streamlined workflows that leads to better systems in iteself is great. I just don’t think the current enviroments - whether ranking or machine learning - offer what is needed (for most users) to go from creating factors to priortising them to incorporating slippage and creating a full fledged system out of it.

3 Likes

I like P123’s ML and P123 classic tools, myself. I guess that does include the downloads and some use of my personal computer. But with modern computers and ChatGPT ,use of the downloads is accessible to most now.

I don’t feel the need to rate each of P123’s tools as I use pretty much all of them. It would be hypocritical of me personally to be critical of any of them. Or make a lot of comparisons when I find pretty much all of P123’s methods useful.

And actually, not just useful individually, but complementary to each other–one P123 feature complementing or building upon others.

Thank you @marco for opening up the platform and making the out-of-sample returns below possible. I retire next month and I sleep like a baby. Due in large part to what P123 has made possible:

-Jim

2 Likes

I agree with @yuvaltaylor point — one thing that frustrates me with machine learning is how fragile it can feel. Sometimes you think you’ve found the "holy grail" with a setup, but a small change in hyperparameters can suddenly blow up performance. Unlike ranking systems, ML models feel like a black box, making it harder to develop intuition.

I spent two months testing different scenarios, universe-by-universe and ways and did get solid results on some, but the short amount of historical data available for simulate (just 5 years plus out-of-sample) makes me cautious. Thanks to @judgetrade in this aspect.

Fom my point of view, forward retraining should in principle give consistency (e.g. if a LightGBM hyperparameters or a combination works on one training window, it should generalize to others), but if it doesn’t, that’s a red flag for robustness. Here we could pay attention to the market ciclicity as well that complicates more the things.

For now, I’ll keep testing and reading, maybe allocating capital later in a kind of diversification — but I see how hard and tedious the work is. The upside, though, is that cross-simulations at least highlight the most relevant factors, which we can still bring back into a more traditional portfolio/ranking system framework.

1 Like

One thing I want to point out that very few seems to explore. Most people are very focused to use ML for future return, which is not that strange. But I have noted that the ML algos are very good at predicting both fundamental factors and other technical factors. Predictors of other factors can very much complement both your ML ranking or your traditional ranking systems. Future return is definitly the most difficult factor to predict of all factors.

The biggest advantage I see with the ML is that they can pick stock a traditional ranking system would never be able to pick up, so it’s very good way to diversify the portfolio.

4 Likes

Based on some of the commentary in this post I think it is important to point out that a traditional ranking system is akin to a fixed recipe while for example a gbm model is more of a recipe book that has a different recipe for each different season. If looking to compare with the traditional framework, it is more like comparing a single ranking system to a book of them except the weight of each system in the book changes based on the opportunity set. It does not find an optimal fixed set of weights for each factor, but identifies patterns in the data and acts on them whether the pattern is “valid” theoretically or not.

2 Likes

When it comes to workflow, all of your points are excellent. But is there anything about these machine learning algorithms that make them superior to the very simple algorithm of multifactor ranking? In other words, assume you could overcome the various workflow obstacles to developing ranking systems and developing machine learning models, and that you came up with an excellent ranking system and an excellent XGBoost or Extra Trees or LightGBM model. Is there a reason for you to choose the machine learning model over the ranking system? The only one I know of is the one I quoted above. That's a very powerful one, if it's real. Is it? Is there literature that shows that these models are better at interactions, non-linearities, regime dependence, and portfolio-aware objectives than ranking systems? If so, are there trade-offs in terms of the inherent capabilities of these algorithms (i.e. are there things that a ranking system can do better than these machine learning algorithms)?

Is this a good or a bad thing? Shouldn't certain "patterns" be off-limit? About seven years ago, I wrote,

Here’s a very simple two-rule system. Pick the stocks from the S&P 500 with a projected PE for the current fiscal year between 10.5 and 12 and with the number of institutional shareholders between 450 and 470. Then hold for five years. Sounds crazy, right? But if you did this five years ago, your four stocks would be DXC Technology (DXC), Huntington Bancshares (HBAN), Newfield Exploration (NFX), and Constellation Brands (STZ), and your total return would be 378%.

If your machine-learning algorithm operates that way, you're in major trouble.

1 Like

There are more types of models in my spectrum:

  • ranking - you can still make millions € using it
  • fully interpretable statistical models - gives more flexibility, interpretable linear models with possibility for non-linear transformations
  • semi interpretable machine learning models - (DecTr, RF, ET with visible trees structures (rules), ask Gemini to analyse forest of e.g., 300 trees to analyse key factors/themes or even explain each tree interactions)
  • black-box (like P123) machine-learning models - … interpretability is almost nil

The trade-off curve is as follow:

  1. Simplicity - Interpretability → Robustness → Less Alpha (but durable)
  2. Complexity → Less Interpretability - More Potential Alpha (but fragile)
2 Likes

It’s called cross-validation It is not a backtest. You are more likely to find a backtest over at P123 classic.

The power of cross-validation to reduce over-fitting–especially with nested cross-validation is the most important–and basic–machine learning concept there is, in my opinion.

I think there are other areas where there is a clear lack of understanding of machine learning in this thread.

But that is okay. People are free to use their own strengths at P123, right?

I am not sure of the purpose of the original post here. My hope would be that people could pursue their own methods and that such a debate would not be to limit the platform.

We already have P123 classic and no one wants to change that. In fact, I have been in favor of adding some tool to it. Like ways to make cross-validation of P123 classic easier. For those who what to use cross-validation.

Random() with a seed to replace Mod() is something that could be borrowed from machine learning. It would be a clear advance for P123 classic. I have suggested it many times as I would like to expand and improve P123 classic.

Marco seems to be committed to opening up the platform and letting people do what they wish without having to reach agreement in the forum or be limited to methods selected by a committee.

Marco, thank you for what you have done and are planning in that regard.Thank you for allowing people to use their own methods to a large degree already.

For marketing purposes P123 might consider having a staff member that posts favorably about machine learning or hiring one. It would be good if that person used cross-validation and could help others understand it. Maybe have the power to implement important feature requests or improvements that come up in conversations in the forum..

Marco your post are nice but infrequent. The forum COULD be a marketing tool for P123.

I think the real potential for growth lies with machine learners. We ARE seeing machine learners come to the platform.

But the growth of machine learners on the platform is more fragile than I thought.

2 Likes

I love both worlds!

Key to me: Use knowledge / instincts build in the traditional world for AI Factor.

Proven Factors:

  • Traditional ranking system = bedrock. Best base for AI Factor models (esp. small & micro-cap, core rankings, and other out-of-the-box ranking systems + test your own favorite Ranking Systems).

Simplicity:

  • Portfolio pipeline or it did not happen! A predictor earns its place inside a ranking that lives inside a portfolio - so we see turnover, slippage, and capacity. That is how I understand it best!

    My Workflow: train on 2003–2020 (or 2019 depending on the target) with conservative hyperparameters (LightGBM, ExtraTrees, No Grid!); look for low hyperparameter sensitivity (stable Spearman + clean lift curves across all (!) HP sweeps); run pseudo-OOS 2020–2024; if you like: add light buy-rule to get tilts you want (for example where data might have gotten stronger in the last 5 Years: Estimates, Actuals via FRANK); then go live and learn and maybe put money in it :wink:

  • Keep design simple. I prefer a basic holdout (no K-fold) so the validation predictor is comparable to the final predictor + we have long and uninterrupted Training (not 100% sure, but some academics state it). Start from proven ranking systems, convert to features, let AI Factor adapt dynamically (Z-scored) and let ML capture non-linearities.

Idea here is to have a base and to be able to concentrate on the features / portfolio strategy implementation:

    • Small/mid-caps: ZScore + Date (mostly with Skip + Date on Feature Level)
      Trim 7.5% | Outlier Limit: 5 | Target 3MRel, 3MTotal

    • Mid-caps: ZScore + Date or Dataset (mostly with Skip + Date on Feature Level)
      Trim 7.5% | Outlier Limit: 5 | Target 6 – 12 Months

    • S&P 500: ZScore + Date or Dataset (with Skip + Dataset or Skip + Date on Feature Level)
      Trim 7.5% | Outlier Limit: 5 | Target 9 - 12 Months

  • AI Factor → improved my traditional models. Sharpened my universe design, understanding regional factors better (Canada/EU/US), and use of Strategy Books with traditional small edge ranking systems (mimicking trees via trad. ranking systems via strategy books).

  • Insight: Markets pay for rate of change: Beaten-down names with improving fundamentals often outpace “great staying great” –> AI Factor Systems often combine mean reversion with momentum at the same rebalance, good for diversification. Understood: not everybody’s cup of tea, but I am o.k. with it.

Anecdotally:

  • Feedback I’ve received from ML/finance practitioners: “strongest ML/finance implementation / Infrastructure I’ve seen so far.”

  • Early career ML engineers at top AI labs (for example Google) build systems via AI Factor in weeks that would’ve taken me years to build, and they would be bored to death to build traditional ranking systems :wink:

  • Best teams: experienced quants with long-term instincts from the traditional world + fast-iterating ML engineers – killer combo!

  • Allocators starting with small tickets and want to scale on proof (at least that is what they are saying, lol) - opportunity for designers, seeing contracts under negotiation.

So, AI Factor is a sharp product, best use is – if you like – to leverage your knowledge from the traditional world!

4 Likes

I would like judgetrade to be able continue to do exactly that, and be able to implement any additional creative ideas he has. With minimal friction. With a small profit to P123 for making that possible. I don’t care about the rest!

Spurious correlations are definitely a bad thing. The way I see many using it using data mining is probably prone to a lot of issues. One has to be very careful about the features that go in and the stock output. As for me I am discretionary, so for me I see it as a potentially valuable list of ideas but I am very early in the journey to know if they can do that for me just yet. I am having a lot of fun learning more about ML and exploring the functionality for now. One thing to keep in mind is that the algorithms will likely get better over time and part of why I am learning is I do not want to get left behind 5 or 10 years from now when and if they do.

My first test to test the functionality I might end up using as a screener to filter out stocks likely to underperform on the short term using a vlookup in excel to filter say the bottom 25% out of the ranking output of my traditional rank. Ensemble approaches is where ML really shines currently anyways. The optimal “AI” system is likely to be a process using a series of different ones.

1 Like

If you are looking to avoid spurious correlations Algoman’s algorithm can do that while looking at correlations over a range of market conditions. Not just a point in time. Doing it with Pearsons correlation, Spearman’s Rank Correlation AND even using non-parametric hierarchical clustering. This last, at least, probably crosses into the area of “machine learning” for some while simply being a non-parametric way to look at correlations for others. IF the label of “machine learning” does not bother you, then you should check it out. It is really quite impressive and sophisticated.

And importantly, Marco is likely to make this even more accessible with Voila (opening up the platform to creative ideas from members essentially). You can just use it (without worrying about the label) if you want: Python program to find correlations and multicollinearity

1 Like

Yes but that is the thing. It requires all kinds of separate tools. I am against a kitchen sink approach to see what sticks (both in traditional and in ML) as it is likely to find many things that might not be robust. I am really not worried either way because I don’t buy stocks purely because a model ranks them highly but I know some do and that is fine. For me that is just the bare minimum to begin my research. I will check those tools out for sure! Anything that helps make the quality of my generated recommendations list better (or even more unique and less correlated) is a win.

2 Likes

To expand on this and to address one of the points against ML, the optimal solution is likely to be for everyone to have two ai predictors at the very least. One for downside and one for upside trained differently. You then remove the likely downside ones from your upside list (or vice-versa for short selling). Otherwise, as Yuval pointed out you are likely compromising.

1 Like

I don’t mean to minimize the problems of correlation of features. I believe I was one of the first to discuss the problem of multicollinearly in the forum (along with Marc Gerstein and pvdb in the following link). At least 10-years ago: : Mis-specification versus multicollinearity

Pvdb’s ideas on OLS were advance machine learning concepts at the time.

The fact that this is being discussed again now, for the most part, is due the recent influx of members familiar with machine learning– like yourself.

It remains a very important and difficult issue, I think. I can document that I have been struggling with it for at least a decade!!! I did not mean to imply there was suddenly one simple solution to the problem. Because there is not one.

But I am thankful for the thoughtful discussions and sharing of ideas in the forum now.

1 Like

While both are potential issues, my point was specifically about Spurious correlation and not just correlation between variables.

It refers to a statistical association between two variables that appears significant but is not causally related, often due to coincidence, a lurking third variable, etc. It highlights the “correlation does not imply causation” principle. Shark attacks and ice cream consumption are correlated, but the lurking third variable is that it is Summer and more people swim. Thanks for pointing out the issues with multicollinearity though as they are a key topic. You probably already know this but pointing it out for future readers of these posts.

I’m just saying that finding a “spurious correlation” could, for some, start with finding high correlations to look at over a period of time. What people do with that is up to them: and you provide a nice partial list of things to look at. .

I would tend to find those correlations over a range of dates myself. Using Python or Excel (where possible) But if I have exceeded the 1,048,576 row limit of Excel, I have to look elsewhere.

AND I don ’t mind having a VIF and/or hierarchical clustering metrics when I do that. But I am not against people writing their own programs to leave those feature out. In any case, Excel is not really an option if i want to look at over 1,048,576 rows of data.

I exceed that row limit after about 5 years with there Easy To Trade universe.

1 Like

Makes sense. Could definitely come in handy for that too. Looking forward to trying what some of the community have created in that regard

1 Like