Machine learning vs. robust backtesting strategies

I've been trying to understand the potential in machine learning on the p123 platform, but I'm not sure - or I'm confused about - machine learning, regardless of the method used, will be better at finding future winning stocks than a method based on robust backtesting strategies.

My backtesting method is as follows:

  • I identify 200-300 factors (nodes) that I have the most faith in and that seem financially sound.
  • I then create 1500-2000 different ranking systems, where each node gets a different weight in the different systems. In most cases, a node weighs 0, while around 70 nodes are given a weight between 1 and 9% in each system, but on a random pattern in each system.
  • These 2000 ranking systems are run in 2000 simulations using the optimizer. The settings for each simulation are as close as possible to what I do "live", but with at least 25 stocks in the portfolio, and limited to the last 10 years to allow out-of-sample testing from 2001-2010.
  • The top 15 simulated test ranking systems are selected, and I take the average of the weights of each node in these 15 systems.
  • Nodes with a weight below 0.5% are removed.

This gives a final ranking system that weighs the top 70 nodes, based on the average of the top 15 systems.

I then test out of the sample period 2000-2010, or use ID= to divide the universe into 5 equal parts for testing.

The question now is: What is it about machine learning methods that suggests that a ranking system that comes from a machine learning model is more robust or better at predicting future price movements?


That is a difficult answer. But all of what you describe is good, IMHO. I would only say that much (all really) is borrowed from techniques that existed long before you or I joined P123. Using multiple universes as well as mod() was first posted in P123's forum after a member heard about machine learning techniques in an O'Shaughnessy podcast (and the poster called it what it is: Model averaging). Nothing new about it.

Also the P123 "optimizer" was not invented out of thin air. Machine learing uses "optimization" also. Often with a "Gradient Descent Optimizer" but it is just automated optimization and not different, in results, from that what you do with P123's optimizer.

My point is the good ideas you site were not invented in this forum. To the extent that the good ideas we can use are finite and already described in the forum we don't need to add anything else, I guess.

Well, actually even then I would not mind some sort of AUTOMATED optimization of what you describe. Personally, I am not going to do all of that manually myself. And frankly, I do not have to.

Marco seems to have taken the time to study and understand this topic well. Honestly, while he cannot provide everything at once, I think what he has described has been spot-on. And it seems we will have a good-deal of flexibility in methods with access to the hyper parameters in many of the model. And a there will be a lot of models to chose from (including support Vector Machines which is very comprehensive).

TL;DR: Great summary of some good (manual) machine learning techniques but I am not sure that it is exhaustive.


1 Like

What I like about what I have seen so far is that the AI factors work well and can be combined with traditional models. For instance,

Take your 200 factors. P123 will do a quick test showing you the predictive ability of your factors for whatever result you are looking for (it seems that forecasting 3 months relative return works well. But you can forecast dividend growth or whatever if that's what you are after). If that's all you want to do, just take the top predictive factors and do it manually.

Or you can allow P123 to run various tests where it changes the OOS period for each test (K-fold). It will run it 4 times and each time the period where it optimizes will be a different segment and the OOS will be a different segment. But there are many other cross-validation techniques to use as well.

You might decide to just use the AI factor for value. Maybe you use 40 different value ratios and let it do its thing. Then you take this AI value factor and drop it into a ranking system where you put in your own static momentum system that you like. So you can combine AI with your previous methods.

It isn't going to be one method or the other...but you can combine both. The AI factor (which is like a complete ranking system) can be referenced anywhere now.

1 Like

Your technique, while admirable, has issues. First, you are only looking at the performance of the 25 (or whatever) top stocks, not the cross-section of the entire universe (i.e. all buckets), which is possible with AI, thus not using most of the information available. Second, your selection criteria is based on factors that performed best up to 20 years ago, not in the last ten years. The markets have changed dramatically over time.

In any case, use whatever works for you.

1 Like

I'm no expert, but I'm looking at it somewhat differently. Ranking systems are one way to buy and sell stocks. AI/ML is another, different way to buy and sell stocks. "A ranking system that comes from a machine learning model" holds no particular attraction for me: I really can't imagine it being a game-changer, and I'm very happy with the ranking systems I've created. But a machine learning model that does not use ranking sounds quite appealing. And, as far as I can tell, the various models out there that have been bandied about do not use ranking at their core.

1 Like

Do you use ML yourself to choose your stocks? Are there any other platforms available for private investors that allow access to using ML in stock strategies?

This is probably a stupid question. I'm still struggling to understand the concept of what ML produces, or what the final product is that you end up with that chooses stocks for you?

I understand that it's not a ranking system, not a simulator, and not a screen, but the trained model that you end up with, what is it? It must use some form of technical or fundamental information from the companies, that it considers to predict the future price movement of the stock. Regardless of how you break the data up into training periods and then test periods, or method used, it will end up with a final product that will then give it a recommendation of stocks going forward. I imagine - but could be wrong - that it must be a form of ranking system that weighs the different criteria it has found by training on the company's data. Anyway, I will probably understand this better when P123 releases the ML solution.

What would that be, or look like?

Have tried the P123 ML platform yet, and if so, what is your impression?

Edit. I attempted to detail some of what Marco is adding to P123 classic in my opinion. It comes down to efficient and effective cross-validation. As well as regularization and better control of interactions (e.g., the depth hyper parameter in XGBoost). And automation of whatever method is chosen (even with some aspects of P123 classic perhaps in the future). I hope @marco provides some documentation on cross-validation at some point. I think I will let P123 do that.

BTW, I believe P123 did not originally intend to have any cross-validation with AI/ML but now believes it is important—delaying the release-date significantly at a high cost. I welcome any corrections on this but it is also true that P123 does not do a lot of self-promotion of what it has accomplished. Marco seems to have a pretty good command of what was done but we have never talked to the AI expert in the forum and much remains shrouded in mystery.

We are all evolving and learning is my only point about the changing ideas about the importance of cross-validation and the best ways to do it at P123.. I hope I am not done learning and that I continue to find easier or better ways to do a thing.

I have developed an appreciation of what P123 classic does as part of my learning experience. I now think P123 Classic has some resemblance to ordinal multivariate regression, particularly in how it handles ordered data and ranks stocks (checked for accuracy by ChatGPT). Even though it is possible Marco and Marc Gerstein invented the P123 classic method entirely on their own, there are a finite number of ways to skin a cat—especially if you are constrained by the need to present something logical and understandable as a method. I use P123 every day regardless of its genesis or similarity (or lack of similarity) to any pre-existing machine learning methods..

The key to making P123 classic more useful to me was to find an optimization method that was effective and fast enough to allow both k-fold validation and walk-forward validation (using Python). I have no reason to think mine is the only useful method of optimization or the best (although I admit to having theoretical reasons for doing it the way I did). I have no criticism for Wycliffe's method of optimization or anyone else's. I, like Marco, think it is nice to be able to cross-validate whatever method of optimization is used.

I use P123 classic but I self-identify as someone who uses machine learning beginning to end at every step even if I use a spreadsheet now and again, Some of this is just a matter of definitions.


1 Like

I'm pretty sure that none of the AI models that have been written about in the literature and that the quant hedge funds employ use ranking. If I'm wrong, maybe someone who knows more about this than I do can set me straight. So if you read some papers about AI models in stockpicking, maybe you'll get a sense of how they work. It's opaque to me, but I haven't investigated it much.

I gave P123's AI a shot a few weeks ago but I had trouble with it. I'm sure I was doing something wrong or something I wanted hadn't been developed yet. I'm looking forward to trying it again very soon.

Indeed, but definitions are important.

I like to classify decisions into three categories: discretionary, algorithmic, and machine learning.

Examples of discretionary decisions are buying a stock because you read something good about it, adding a factor because it makes sense to you, switching something about a system without backtesting it. I make many discretionary decisions, but none of them involve actually buying and selling stocks.

Algorithmic decisions are those that are arrived at by consulting a spreadsheet based on formulas or a website like Portfolio123 that is also based on formulas. "Data mining" is a subset of algorithmic decision making, and "quant" is a synonym. Algorithmic decision-making is what Portfolio123 was originally built for.

Machine-learning decisions are those made by machines that have trained themselves (with, of course, some human guidance) by taking data and manipulating it to arrive at better and better results (by "better" I mean more predictive). Examples outside the financial world include chess bots, Netflix recommendations, and LLMs.

When the term "machine learning" was originally coined in 1959 by an IBM employee, it was synonymous with "self-teaching computers." I think we should stick with that original sense for clarity's sake. The Wikipedia article on Machine Learning is quite good at making sense of all this.

1 Like

So to be clear what I do in Python is machine learning pure and simple. If I were to name the Python program I use a rational person would say: “Yep. That’s machine learning alright.” Maybe you will have to trust me on that.

The other thing I said is that instead of using a spreadsheet to randomly add or subtract from factor weights there is at least one fully automated way to arrive at those weights. And that I like to cross-validate my methods. This is not the only method I have cross-validated. I also said Marco finds value in cross-validation.

But I think that it is a simple fact that what I do is machine learning with cross-validation being an important part of what I do. You are welcome to use your own definitions for what I do if you want. I really don’t mind.

Stepping back a little this is a compliment to P123 classic if you are not committed to one single method of optimizing the rank weights.


When you have done that test, what is your requirement to accept the out of sample result and not go back and start over?

Most of the AI models predict something like the future return. Once the return is predicted for each stock in the universe, the prediction are sorted and the top N stocks are bought. It is like a ranking system but instead of ranking factors directly, it ranks the predictions that comes from the factors.



So I THINK I know the answer to this but is this how P123 is going to do its AI/ML? Also, I think the method Azouz articulates has value but P123 classic does work. Works with multiple different optimizations strategies, I believe. I don't think I am expressing any preference for any particular optimization strategy for P123 Classic in this post. And not above really (I only said I had some theoretical reasons for the method I chose without making any comparisons)..

Do you believe the AI/ML adds value to P123 classic and is it worth paying extra for? I have argued above that the cross-validation methods that will be used in the AI/M add considerable value. I believe cross-validation is important and you make it simple to do. Do you agree?

In what other ways will the AI/ML add value?

This is not a theoretical question for me. I will be making a decision as to whether to pay $1,200 per year to rebalance and potentially I will need to upgrade to an Ultimate membership to test and train a system (not sure about that). Yuval is not convincing me to do that and he may have a point about P123 Classic being able to do the job for me personally—in part, because with considerable effort using a streamlined and automated optimization strategy, I can do cross-validation with P123 Classic. Not theoretical for me at all. And not meant to be contrary to Yuval either. I may have some broader agreement with Yuval on the value of P123 Classic without having to use his exact optimization methods, I hope. I don't care about definitions and stipulate to whatever Yuval says on that. Anything I said about my particular P123 Classic strategy having any similarity to machine leaning was entire positive in my mind at that time.

Wycliffe's is basically asking the same question at the start of the thread, I think.


1 Like

In addition to this, ML models can be trained on various targets (labels).

Plus, predicted targets can be plugged into the final model. Its called stacking. We may have three targets: a) predicted return or b) future EPSQ/current EPSQ, or c) probability that this stock will be new addition to S&P 500 index within 1 year. Then predicted (a) is function of: [predicted(b), predicted(c), other factors]. Of course we would need to create these labels and careful about look-ahead bias.

This sort of approach is used e.g., by DraftKings for NFL outcomes prediction.
[Modeling Football: Combining ML models and Monte Carlo simulations | by Ian Dorward | DraftKings Engineering | Medium]

P123 ML approach (currently relatively simple) should yield similar live results as p123 classic, but in addition, provide uncorrelated strategies.


Correct I believe.

Extra Trees Regression gives similar results to my present P123 Classic model with cross-valdiation. And my P123 Classic model has some out-of-sample data as a funded port. But the stocks are different for the 2 models, I could either model-average by adding the ranks of the stocks and/or just put both strategies into a book. Stacking is also a good idea for the future. Not sure exactly how uncorrelated my strategies are. But also, using a book of my strategies increases the number of assets and often stocks held by both models are bought and sold at different times. This clearly increased the liquidity and the strategies are not 100% correlated (to an unknown degree for my strategies). There is also less idiosyncratic risk if the holdings of a single stock is less with 2 strategies rather than one..

I will probably run a book of a P123 classic strategy and an Extra Trees Regression strategy if the cost does not outweigh the benefits. While I might actually run that on a spreadsheet it is true that I might not considering my present understanding of the costs. But I don't mean to imply that I know for certain whether it would be worth it or not for me personally. Maybe I will find some new models or get new information. I can't predict some things about the AI/ML offering or all of the models I might try in the future.

Anyway, my results so far agree with what you said above.

I might add as an aside that P123's Extra Trees Regression will be a heck of a lot easier than what I did to be able to cross-validate a P123 classic strategy. But I have already invested the time and cannot get it back. If I were new to P123, I would go with the Extra Trees Regressor model (using P123's cross-validation methods) and be done after a weekend even with any extra costs involved. I support what Marco has done to make cross-validation easy (which is not to suggest that is the only benefit if the AI/ML offering).


My ML models have correlation ~0.6 with my p123 classic using similar factors and applying the same direction constraint for ratios (monotonic constraints).

P123 could consider to add HistGradientBoostingRegressor. It also have nice feature of monotonic constraints.

Scikit learn docs:

  • This estimator is much faster than GradientBoostingRegressor for big datasets (n_samples >= 10 000)
  • HGBT almost always offers a more favorable speed-accuracy trade-off than RF
1 Like

@pitmaster is absolutely spot-on with this. I posted some cross-validated results using HistGradientBoostingRegressor previously in the forum: Monotonic Constraints. XGBoost has some similar hyper parameters e.g., "hist_gpu" as well as the option of monotonic constraints. But like Pitmaster I find Sklearn's version easier to use.

If the out of sample return (2001-2014) is less than 2/3 of the return achieved for the period it is optimized (2014-2024), I will drop the ranking system and start over.

Could you elaborate on how you have developed your ML models?

  • Have you used data from the "p123" platform?
    *And also, what exactly do you mean by "my 'p123 classic' "?