Ranking vs machine-learning algorithms

Jrinne · October 3, 2025, 2:06pm

While clearly within the category of machine learning, I think Factor Analysis can be a powerful complementary tool for some.

It may seem a bit automated to those who prefer manual control, but it’s worth noting that factor loadings are simply the correlations between observed features and latent factors — and these can be inspected and interpreted directly. You also get transparency into how much variance each factor explains and the uniqueness of each feature — a measure of how much signal is unique versus shared across latent structure.

Factor Analysis excels at uncovering latent constructs like “value” or “growth.” Once identified, it quantifies how aligned each feature is with those constructs through the loadings — and it does that very well.

And of course, you’re still free to exclude a feature — after looking at it in your own way. You can consider its uniqueness, loadings, or variance explained, if you find those informative. Or you might remove it because you believe it’s a collider variable, if you’re investigating potential causal relationships.

I’m not suggesting anyone else has to use this tool — but I also see no reason not to myself. It’s valid, statistically conservative, and older than I am — well-established in traditional statistical theory.

yuvaltaylor · October 3, 2025, 4:48pm

How do you train an AI factor for only upside or only downside?

Jrinne · October 3, 2025, 4:52pm

One widely published method made as a feature request.: Add Support for LightGBM’s XENDCG Objective (Better Ranking)

yuvaltaylor · October 3, 2025, 4:52pm

I apologize for not making that clear at the outset. I have tried using the machine learning tools, but not very successfully. I wanted to figure out whether I should keep trying. I am very open to being convinced to do so by user's replies.

Jrinne · October 3, 2025, 4:55pm

My only point is these debates are great fun but they should not stand in the way of progress at P123. We should not wait until we get some sort of consensus or some number of likes for a post before newer ideas can be implemented. Adoption of new (or somewhat new ideas in many cases) should be sped up not slowed down!!!

As an example, cross-validation was discussed at length and discounted by nearly everyone at P123 until it wasn't. That is until it became a cornerstone for P123’s AI/ML. Not sure how else to come at that one. And I don’t want to beat a dead horse.

Its almost as if we need an authority figure to tell use how to think. Marc Gersteing says equal weighting is quite the bom and we hear nothing else in the forum. Marco accepts cross-validation and quickly only a few have lingering doubts.

Here is an example of Marc telling us that adding noise to our backtest is a good idea. In truth this is quite insightful. Adding noise in a controlled manner can be a regularization method and he is also discussing permutation importances here. Brilliant!!! Truly remarkable: WTF!!! We're doomed. - #8 by mgerstein

But it took is over 10 years to develop some more formal, controllable and widely accepted regularization methods (elastic-net regression being just one example). And we still have not put permutation importances into action at P123 despite Marc’s endorsement in that thread.

Few, at the time, wondered if we could see if there were already published papers on the topic and maybe considering implementing what had already been done by others.

Hmmm.. Here is an idea from the 1990s we could could look into:

.One widely published method made as a feature request quite a while ago. It got 6 likes including one by Hemmerling who tends to be skeptical about ML methods: Add Support for LightGBM’s XENDCG Objective (Better Ranking) . Did anyone at P123 read it and consider it?

Don’t get me wrong. I have downloaded data and tried it. It is not a strong priority for me, but it is one clear answer to your question as to one way it is being done out in the real world.

I honestly think P123 is a bit ahead of the curve now and I don’t want to be an alarmist. Progress is being made. Adding LLMs will be like as steam-catipult on an aircraft carrier–keeping us ahead for awhile. But things change pretty fast these days.

One way to keep ahead is by opening up the platform. Again @marco is working on some good things to keep us ahead–including allowing members to contribute through Voila etc.

SZ · October 3, 2025, 5:22pm

A wise man whose name starts with a Y once opened my eyes to manually changing/ creating targets as I did not know we could do that in P123. Basically, one can use a different target formula for the bullish system than for the bearish system. An example would be capping(a floor) or using a log(via eval for negatives) or whichever negative limit on the max (negative) loss so the model does not tune too heavily to prevent losses and can focus more on upside error. To make up for it, a second model specially tuned to prevent them is created. Ying and yang. Of course there are other ways too like quantiles, different objective functions, etc

I will say working on these ideas requires a lot of out of the box thinking. We are so used to working with traditional and AI Factor is a different framework altogether. Even things like feature selection are very different. I used a mindset of “how could I help the algo differentiate between good and bad instances of the same correlations” with a third variable and those ended up being the most important features by far.

SZ · October 3, 2025, 5:23pm

That would be amazing. I would help fund this. I have wished we had this since I learned about AI Factor. Could be useful as part of an ensemble with other methods I think.

jpromoli · October 3, 2025, 6:46pm

Does AI factor effectively just generate a fixed ranking system? So by chance if you created your own manual ranking system that happened to match the AI factor’s output exactly and then ran them side-by-side going forward, you would get identical results?

SZ · October 3, 2025, 7:13pm

No they are very different-at least for some of the models. Just to give an example, a fixed system could (typically) tell you debt is bad in a linear way while the ai could say debt is even worse if rates are spiking, giving it more importance suddenly. Of course it can only do so if you have the matching features.

jpromoli · October 3, 2025, 7:51pm

I’ll put it this way - if you put AIFactor(“…”) as your quick rank formula, I presume it’s not changing its factors/features and coefficients/weights over time going forward. So in that case could you theoretically come up with your own ranking system with the exact same factors and weights as what AIFactor(“..”) generated? Or does AIFactor(“..”) actually change dynamically going forward(if so, is there something I can read on this site to learn more about what it’s actually doing)?

SZ · October 3, 2025, 7:52pm

Its not that it changes per se but rather it finds different working combinations from the get go instead of a fixed combination. Depending on the available opportunities that fall in the right bins it will do those. I guess you could think of it like a group of strategies rather than a single one. Perhaps you could get somewhat close with a group of ranking systems you change weights and allocations to depending on the occasion but it all depends. Keep in mind some of the “findings” could be noise too. It also changes by model. I am mostly talking about gbm

AlgoMan · October 3, 2025, 8:57pm

If you create a Linear Model with Normalization by Date and Scaling by Rank, you can convert it to a traditional P123 Ranking system.

Jrinne · October 3, 2025, 10:10pm

And to address some of the questions about ML above–P123 becomes a machine learning algorithm if it is done like that. No question about it!!!. P123 rank weights completely optimized by a machine learning algorithm as linear regressions are a type of machine learning.

Machine learning by the definition of machine learning.

And the Python program you shared with VIF is a significant addition to P123 classic/machine learning. VIF being designed to handle multicollinearity in linear models and linear models can be used to optimize P123 classic. Specifically the weights in the ranking system.

Generous of you to share both! Your ML/P123 Classic linear model method and your program.

It was also mentioned previously this thread and you asked about here as you know: AI Factor - Designer Model? - #29 by AlgoMan , AI Factor - Designer Model? - #15 by marco , AI Factor - Designer Model? - #16 by Jrinne

So to be precise P123 can be an ordinal regression machine learning method. Making the line between P123 classic and ML a little less clear.

For those who do not like machine learning, I get that you can hand-pick the rank weights and then it is not machine learning. And of course you can adjust the weights after doing a linear regression.

Its not like you are locked into one or the other.

AlgoMan · October 3, 2025, 10:56pm

Speaking of starting to blend P123 classic and machine learning.

Something I would like to see is to use Z-score as ranking option in the classic ranking systems. Like creating a composite that sum all the underlying factor z-score, preferably z-score by dataset as well as by date. That would give traditional ranking a completely new dynamic.

As an example, when using ranking (by date) for volatility factors, in calm markets (tight z-score range) the volatility factor shouldn’t matter as much as in a bear market (wide z-score range), but when always ranking between 0-100, volatility carries the same weight regardless of the volatility range of the underlying factor. If we could use z-score by dataset, the scoring would be a dynamic range and not a static range as now.

yuvaltaylor · October 4, 2025, 2:50am

But that would introduce look-forward bias. The only way to properly do this would be to limit the dataset to past dates. In that case, the dataset would consist of one date at the beginning of the period and a huge number at the end. And that would, in turn, introduce wild inconsistencies in performance.

This is also a reason not to use the "dataset" option in AI models.

Also--and I'm really not sure about this--but I think the "Normal Distribution (Experimental)" option in ranking systems' ranking method is the Z-score.

Jrinne · October 4, 2025, 5:15am

Yuval, I think you’re making a strong point here.

Note: You would apply the exact same process when making predictions — not just during cross-validation which what I have describe below. Yuval’s point is fundamentally about prediction, not necessarily cross-validation. But it is the same process for predictions and cross-validation.

You’re absolutely right that normalizing an entire dataset at once can introduce look-ahead bias. But that’s not the only option—and it’s a well-known issue in machine learning with well-established solutions.

The correct approach is to normalize the training data using its own mean and standard deviation, and then apply that same normalization to the test data. This preserves out-of-sample integrity and avoids information leakage. So yes, using “dataset” normalization without proper handling is problematic. But it’s not a new or insurmountable problem—it’s just one that requires careful handling.

Your comment that “this is also a reason not to use the dataset option in AI models” sums it up nicely. It can be a real issue, and while some argue the practical effects may be small, it’s not the theoretically correct way to proceed.

As for Algoman’s idea: I’d be a bit surprised if he isn’t already managing this concern with P123’s download and possibly in his training and work. But regardless, it’s something P123 could manage in the theoretically correct manner..

Algoman, if I understand correctly, your suggestion could let us do a proper regression-style analysis within P123 Classic—rather than the rank-based (ordinal) regression.

The appeal is obvious. Today, a stock’s rank tells us its value relative to other stocks on that date. But in a frothy market where nearly all stocks are overvalued, ranking alone can obscure that there are no true bargains. Z-scoring over the training set (training set and not the entire data set), and then applying that transformation to test data, would allow us to detect this—and maintain consistent factor scaling over time.

In short: I think you’re on to something powerful here. And I agree with you that bringing Z-scores into P123 Classic (with the option to normalize by dataset or by date) could add a valuable new dimension. And it is possible to do it correctly.

BTW, I think Algoman already knows this but for others reading this, here is the Sklearn code using scaler (uses z-score to normalize) to handle Yuval's concern:

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Split data first
X_train, X_test = train_test_split(X, test_size=0.3, shuffle=False)

# Fit scaler on training data only
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Transform test data using training parameters
X_test_scaled = scaler.transform(X_test)

Whycliffes · October 4, 2025, 6:22am

I can hardly call myself a new user here anymore. It took some time and a lot of Python coding, to arrive at the two ranking systems I use today, but the advantages, as I see them, are:

* The features (stock criteria) are not in a black box, which is what I dislike most with AI-factor.

* There are few limitations on the use of the number of nodes. I have a system with 200 nodes and one with 70, but both are equally good.

* I find it noisy and complicated to just understand Target & Universe, Normalization, and proper Validation. Andreas has been a great help; 🚀 AI Factor Design Choices - by Andreas Himmelreich

* There is no easy solution to convert the feature usage in AI-factor to a ranking system. I have tried using the Coefficient and weighting them accordingly, but it does not come close to the same return as the AI-factor.

However, what I do like is that it can function in a good combination with the ranking systems, either in the use of simulation books or the AI-factor directly in the ranking system. It is quick to use. For the ranking systems I have now, there are 10-15,000 simulations and a considerable amount of Python coding before I was satisfied, and I believe the AI-factors, for some aspects, are better at ensuring that overfitting doesn't occur, which is quite easy with ranking systems.

A very interesting discussion, and here is my summary of the post up until this point:

Arguments for Ranking Systems (and against Machine Learning)

Simplicity & Interpretability: They are easy to understand and debug, unlike ML, which is often a "black box" (yuvaltaylor, ScifoSpace).
Robustness: Ranking systems are more stable. ML models can be "fragile," where small parameter changes lead to large performance swings (ScifoSpace).
Avoids Spurious Patterns: Less prone to finding "spurious correlations" (meaningless relationships like ice cream sales and shark attacks), a risk SZ warns about in ML.
Long Time Horizons: Better suited for very long backtests to find factors that are robust over time, whereas ML can have a built-in "recency bias" (yuvaltaylor).
Technical Pitfalls: ML can have technical pitfalls like "look-ahead bias" if "dataset normalization" is used incorrectly—a point raised by yuvaltaylor and confirmed by Jrinne as a real issue that requires proper handling.

2. Arguments for Machine Learning (and against Ranking Systems)

Finds Complex, Non-Linear Patterns: Uncovers interactions between factors that a linear ranking system would never see (Victor1991, judgetrade).
Dynamic and Adaptive:
- Described by SZ as a "cookbook with a recipe for each season." The model adapts factor weighting according to market conditions, unlike the fixed recipe of a ranking system.
- Can use Z-scores to give factors a dynamic weight based on how abnormal a value is from a historical perspective, not just relative to other stocks today (AlgoMan).
Advanced and Targeted Techniques:
- Can be trained for specific goals, such as separate models for upside and downside by using, for example, LightGBM's "XENDCG Objective" (Jrinne, SZ).
- Can use Factor Analysis to find underlying, latent structures (like "value" or "growth") in the data (Jrinne).
Diversification: Can find stocks that traditional systems overlook, thereby improving diversification (AlgoMan).

3. Synthesis: The Lines are Blurring (The Hybrid Approach)

Ranking can be ML: If the weights in a traditional ranking system are determined by a linear regression, the system is, in practice, a machine learning model (AlgoMan, Jrinne).
The Optimal Solution is a Hybrid: The most potent approach, suggested by judgetrade, is to use a proven ranking system as a "solid bedrock" and then let ML algorithms find the complex, dynamic relationships on top of it.
The Core Trade-off: The debate is summarized by pitmaster as a trade-off between:
- Simplicity/Robustness (Ranking)
- Complexity/Potentially higher, but fragile, alpha (Machine Learning).

pitmaster · October 4, 2025, 9:21am

It has been experimental for some time now... Does anyone know how it actually works?

SZ · October 4, 2025, 11:15am

Thanks for summarizing. I am sure it will be useful for users just jumping in and reading the post. Just wanted to clarify LightGBM's "XENDCG Objective" and the idea of having two separate models are two separate concepts. XENDCG is just a different way to calculate the error in the model with potential to be better but remains to be seen (depends on your dataset). In short, it is trying to predict the order/rank of the returns rather than the percentage returns, which I do like.

Jrinne · October 4, 2025, 12:06pm

But also, it can be adapted to focus on optimizing the upper and lower ranks, and it has been used to create long/short systems using this feature — something you’ve discussed with Yuval.

And as you pointed out, for those who say they prefer working with ranks, this approach is specifically designed to optimize ranking performance. In fact, it emphasizes getting the top ranks right — or in the case of long/short systems, both the top and bottom.

You can even specify where you want the model to focus its ranking accuracy — for example, telling it to focus on the top 50 stocks.

I don’t want to mislead anyone. I haven’t personally tried this for finding short positions, and while it did work well for identifying long positions when I tested it, I’m not prepared to say it’s better or worse than other methods. I didn’t run any meaningful head-to-head comparisons.

That said, when it comes to finding shorts, the approach could be very simple. Right now, we typically run a machine learning algorithm to predict returns, sort the predictions into ranks, and buy the top-ranked stocks.

But you could just as easily buy the bottom-ranked stocks to construct a short portfolio. It’s really that simple — no need to overcomplicate it.

BTW, @Whycliffes that was a well-informed and detailed summary. I would comment in a separate post, but I cannot think of anything to add.