A very simple method, but could this work:

Whycliffes · September 14, 2024, 5:28pm

I have downloaded my factors and have the following overview with normalized numbers for each factor (feature or nodes):

In column D, I have three months of future relative returns compared to SPY( FutureRel%Chg_D(65, GetSeries("SPY"))).

What if I

Filtered out the 5,000 best performers from a total of over 100,000 lines in column D.
Then, I add a new row nr. 1, that sums the number of normalized values that are 1 for each factor (feature). (I may have misunderstood this, but 1 is the highest score each factor can achieve.)
Then, I create a ranking system based on, for example, the 40 nodes that have the highest number of 1s among the 5,000 best performers.

Wouldn't that provide a very good indication of which of my 300 features (nodes) perform best in relation to future three-month returns?

The idea here is that the features with the largest amount of 1 in the 5000 best performers give a indication of what characteristics that would be a part of the winning stocks.

ZGWZ · September 14, 2024, 5:43pm

It is better to perform this operation in reverse for 5,000 worst performers

Whycliffes · September 14, 2024, 5:50pm

Why? And how would you do it?

ZGWZ · September 14, 2024, 5:53pm

euroventure · September 14, 2024, 5:55pm

fantastic insights, I shall be bashing the bishop all day over those insights

Whycliffes · September 14, 2024, 6:02pm

Thank you, but the idea came in a conversation with Test_user. Keep us updated on how your testing goes..

Whycliffes · September 14, 2024, 6:09pm

But would a large enough sample base—even more than 5000 help to model away from fitting the strategy to the "winning lottery tickets"?

One question, when you normalize numbers, am I correct in understanding that 1 is the highest. So if the stock factor (node) is a measure of high insider buying, the largest amount insider buying in the sample set will get a normalized value of 1?

ZGWZ · September 14, 2024, 6:14pm

As I recall, the machine learning model trained on the 1/5th of the stocks with the highest future returns in the paper I'm talking about got OK returns, but they were extremely volatile, and the return-to-risk ratio was not even as good as the model trained on the 1/5th of the stocks with the second highest future returns.

Whycliffes · September 15, 2024, 7:09am

"This is the 50 features with the highest number of '1's. Equal balanced.

Not the best result:"""