Question for Yuval

First of thanks for all the insightful content you provide on the site, and congratulations on your success in 2020. I myself had a flat 2020, having moved to low volatility midcap focused models. These went nowhere in 2020, and I was hard pressed to find anything that performed worse (XMLV as an ETF comparative benchmark). So overall pretty disappointing. I have since moved to only using ETFs, as I feel there is little to no alpha to be had in the Large, Mid, or even ETF size small cap space, that can’t be captured by a simple factor ETF.

I did have a Microcap model at one point that I started in mid 2018, that immediately fell flat along with the rest of most microcap strategies. In mid 2019 I decided that microcaps, while they had the potential for huge out performance, had more potential for huge volatility and manipulation.

I continue to be amazed at the giant dispersion of returns in the microcap space. For example, your microcap model had huge returns last year, and many others had negative. This, to me, just demonstrates the very low margin for error in this area.

I have followed your models for a while and am familiar with the number of factors (innovative and some unorthodox) you use. Can you address the below questions?:

1.) Do you use nodes in your ranking (i.e all your value factors are a node, all your quality factors are a node, etc. Similar to the P123 combination model)?

2.) I understand that a lot of your factors are custom and not the standard P/E, etc., but how can those be SO much more effective than the standard ones? I would at least think they would be in the same ballpark, with custom factors having a slight edge. For example, using the P123 Combination Model (which seems reasonable, solid and well thought out) on the the microcap universe, produces less than stellar risk adjusted returns.

3.) With so many factors, if your model falters, how will you be able to track down why it’s faltering? I have always followed the idea that less factors are better, so you can clearly see which ones may be contributing to under performance. I.E, the idea that things should be as simple as possible and no simpler.


I think you mean composite nodes. The answer is no.

I don’t know the answer to this, but I would hazard a guess: that the standard factors have simply been arbitraged away.

Through constant backtesting, and including the last few months/years in the backtests. There are a large number of factors that I no longer use because I found they didn’t contribute to my backtest results.

Regarding the idea that fewer factors are better, please check out the following article:

Appreciate the feedback and the link. I actually read that article, but will refresh on it. So basically, your ranking is one large list of factors with a very small % weighting in each. I wonder how eliminating the conditional nodes would change things in my ranking systems. Thanks again.

Just to be clear: I use conditional nodes quite often, but seldom use composite nodes.

Yuval, I’m curious, do you find the ranking different in not using composite nodes, or is it more for ease of management of factors in not using composites?

Theoretically, if weights are equivalent using composite or non-composite nodes, the overall stock rank should be the same. i.e. with no composites, factor weight is 5%. Or using a composite, the composite family rank is 25%, factor weight is 20% within the composite, the overall factor still gets a weight of 5% overall.

If there’s another nuance to it, I’d be curious to know.


Let’s say you have four factors and ten stocks, and let’s say the ranks are as follow:

A 30 60 70 80
B 60 80 10 0
C 0 10 20 90
D 50 70 0 40
E 40 50 90 70
F 90 30 80 30
G 70 0 60 10
H 20 90 40 60
I 80 20 30 50
J 10 40 50 20

With no composite nodes, E ranks at the top, followed by A, then F.

But if the first two factors are composite and the second two are composite, then you have the following ranks:

A 40 80
B 90 0
C 0 70
D 80 10
E 40 90
F 80 70
G 20 30
H 60 50
I 50 40
J 10 30

Now F ranks on top and E is second, with A taking third place.

Now take another look at the first table of ranks. F does very poorly on factors 2 and 4. So it’s appropriate that F gets third place. With composite nodes, that distinction vanishes, and F gets first place. Similarly, all four of E’s scores are pretty high–at least average. So it’s appropriate E gets first place. But with composite nodes, he comes in second.

If we were to shuffle around the factors so that factors 1 & 3 were combined and 2 & 4 were combined, we might come up with yet different overall ranks. Ditto if we combine 1 & 4 and 2 & 3. The only really fair way to judge all the candidates is to keep all the ranks separate.

The first two pages of the ranking system tutorial discuss this odd blending of weights.

But its not clear why it does this instead of using a simple weighted average.
Is there some advantage to calculating it this way?


Thank you Yuval. That is very counterintuitive and you have given the best explanation I have seen of this.

Perhaps this is a slightly different topic. But not really given the real effect composite nodes have on a ranking system (illustrated here). One can do factor analysis on P123 factors. It will end up putting similar (or correlated) factors together into nodes: Sentiment factors into one node, Value in another etc. Giving weights to the factors and the nodes and removing some factors that are not contributing much.

Is what factor analysis recommends (or principle component analysis which is similar) better than separating out the factors? Theoretically, it could resolve some issues with multicollinearity. As a bonus, it removes factors that are not contributing and therefore might be causing problems with overfitting. Multicollinearity is where some of the factors are highly correlated which I am sure happens when people use a lot of factors here at P123.

Does putting factors into composite nodes actually help with multicollinearity here at P123? And if so, does that end up being important or even useful?

I don’t know of course. But there are theoretical reason for using–or not using–composite nodes. Or deciding that it may not make much difference to your system.


I am trying to wrap my head around this, I understand a composite node is calculated different than a conditional node. Just to be clear for example is using a conditional node with 4 factors calculated the same as using a stock formula or stock factor with 4 separate factors?

I am not trying to hijack this thread but I definitely sympathize and agree. And it is an unbelievably important topic, here and elsewhere.

I wish Yuval or Marc could give us another truly great explanation and make this intuitive for all of us. Honestly, if they could I think we would all be making a lot of money without a sweat. We would get it. Get how all the factors are interacting and see the patterns without needing P123 or anything. Of course, if we were that good we would be recruited to work on some quantum-computer project somewhere and be predicting everything flawlessly.

As far as making this intuitive, the mathematicians will not be of much help. They look at this as an n-dimensional space. Remember those matrices from high school algebra? They were supposed to help us with this. It is the best they have for making this intuitive or at least tractable.

Einstein had just a brief, intuitive glimpse at four-dimensions and changed the world.

In a sense, they are clearly right. Their’s is a way to look at it. And Marco can—and does—put those matrices into a computer with a little matrix multiplication, I would guess. Maybe he uses a lot of loops at the cost of some computer time. Either way, one of the great things that P123 does for us.

P123 is solving this for us and we often never notice. Marco is taking care of a lot of difficult math and making us feel like we are the smart ones. He does it seamlessly and that well.

No doubt the mathematicians are right when they say: “No one can imagine more than 3 dimensions in their heads.” And even they get into HUGE PROBLEMS when they try (Einstein being a rare exception).

For example, they spent decades worrying about local minima (a potential concern for the boosting Steve Auger uses) that GENERALLY DO NOT EXIST IN MORE THAN 3 DIMENSIONS. Instead, there are “saddle points.” Decades wasted by the mathematicians because their intuitive understand of this is no better than ours.

Anyway the mathematicians will say—after a little matrix multiplication–that Yuval got it exactly right.

Every tool has advantages and limitations. The key to success is to understand the limitations and use accordingly. And Jim, I think in one dimension (UP) so the 3 dimension minima that you describe is not a problem for me.

Just to be clear this is a good thing for boosting. I was not trying to be critical of any method.

Also, the only difference between P123 classic and boosting is P123 classic assumes a “flat” hyperplane and boosting allows for things to be flat or a bit of a curve: a manifold. Clearly the same factors can be used for either one.

Oh yea, and pick your poison: manual optimization to find the hyperplane or let a computer do it but learn Python first to find the manifold.

Steve has picked a good tool IMHO. But if your data fits a hyperplane pretty well then P123 classic is probably your best tool. Again just in my opinion. If someone has other reasons to like P123 classic (or boosting) my only recommendation would be to keep using it.

The only point of my posts is that P123 classic is already doing a lot of pretty advanced stuff. Often without the user fully appreciating it.

Yes, that’s the whole point of composite nodes. If you used a simple weighted average, then putting something in a composite node would make no difference at all.

There is an argument for using composite nodes. It goes as follows. You group like factors together. Then you can get a company that is strong in general in each of the factor groups. If you don’t group them together, a company can be strong in various discrete factors but be weak in the group as a whole.

There is an even stronger argument for using composite nodes, and it goes as follows. Let’s take ROE. Using DuPont analysis, you can break down ROE into three ratios: income to sales (net profit margin), sales to assets (asset turnover), and assets to equity. Now you can rank all those separately in a composite node and you get a quite different ranking (and arguably a more meaningful one) than if you simply ranked companies by their ROE. But if you didn’t use a composite node, you would lose sight of ROE altogether as those three factors would just get mixed in with a bunch of unrelated ones.

Mike, have you read this?

Jim - you have been posting about XGBoost for the last two years. Why don’t you stop while you are still ahead :slight_smile:


It’s not a competition among methods and a rational person could end up using more than one method. .

In fact I am using a method not discussed in this thread at all now. Actually a couple not discussed in this thread.

P123 classic is pretty amazing really. And this is a thread about P123 classic and composite nodes. Perhaps I should not have mentioned boosting. I did because I think it is the same topic. They are both just ways of mapping out a flat or not-so-flat manifold in hyperspace.

The topic of composite nodes is an interesting topic.

For the record I have used factor analysis as the basis for determining what factors to put into composite nodes and to determine the weights of the nodes and factors.

So I posted with one perspective on how composite nodes can be used.

It worked and made me money. I’m not going to pretend that didn’t happen just because I am a big fan of boosting too.

My apologies to anyone if I promoted one method too much in this thread about composite nodes or tried to discourage anyone from using something that is working for them or that they want to investigate further.



Jim - you are a very smart guy and most of your posts go over most people’s heads, certainly mine at least. But the message you are conveying (in my interpretation) is that Marco has wasted time and resources updating the API and dataminer because P123 proper is already superior. Please choose your wording carefully.

So what I would like to say here is that XGBoost is a means to an end, it is not the end. I have ideas of replacing it with my own home-brew ML algorithm that writes back into P123 a Ranking System. The ML algorithm will embrace some of the concepts in XGBoost but will not be decision tree based, and will be easily mapped into an RS. The algo could be along the lines of what I already do with the ranking system optimizer, which is at the heart of Inspector Sector’s Cloud Computing.

The API opens new doors for a vast array of applications, not just XGBoost. Thank you Marco/P123 for making improvements to the API and dataminer. I am sure that Jim has great ideas for P123 and I am just as sure that those ideas can be magnified externally using Python.


My apologies for not being clear about my opinions on boosting in this thread. I just get tired of people—including me when I do it—saying their way is the only way to do it. Especially when the thread is just about composite nodes. I was trying to keep my opinions and biases out of this thread and that my have been perceived as a shift in my opinion. What I should have done is not mention boosting at all in this thread. My apologies to everyone for not doing this.

So here is my personal opinion about boosting which I do not think will contradict anything I have said before. My apologies if this is slightly nuanced.

  1. Boosting is a non-linear method.

  2. P123 classic is a linear method in that it uses constants for the weights of factors and nodes.

  3. Boosting being a non-linear method will handle non-linear data better than P123 classic as a general rule.

  4. Most (but not all) financial data is non-linear ESPECIALLY WHEN YOU START CONVERTING THE INPUTS OR PREDICTORS TO RANKS.

Therefore, boosting should be the better method for most financial data, especially when you are using ranks as inputs. And in backtesting I have found this to be the case so far.

Steve, as you know I am the one who introduced you to XGBoost while we were working on some things with TensorFlow. So obviously I like boosting (or I would not have recommended it). This has not changed.

Perhaps we could move any further discussion about boosting to another thread. Composite nodes is an important topic and it deserves its own thread. Yuval has some interesting, important and useful ideas on this topic and I would like to give him (and others) room to express them here.


Now that is the Jim I know and love!

Attached is the ranking system for Inspector Sector Cloud Computing. It is actually six ranking systems in parallel, each very optimized on its own. Within the individual nodes, you can see that a conditional node is used for bull and bear markets. The condition that determines bull versus bear is when the SKYY ETF moves above or below a moving average.

Conditional nodes can work very well for this sort of application. I recommend making the condition time period based, not factor based. The latter becomes very confusing.

When you say “time period based”, what do you mean exactly?