@Jrinne
Jim,
I totally agree with all you write.
I’m one step before, at the point to select which factor (or several) i should use.
After that, the second step is the mechanics by how the factors create profit, which is what you described. That would be my understanding as well.
So let’s go back to the first step, which factors to use.
I can do it manually, looking at the buckets, to see if I have a good monoticity over 10 buckets and during the last 20 years. Than selecting the best 30 factors and invest and don’t change anything.
But if I want to ride the best 10-20 factors per quarter, the approach would be different.
I was screening a lot of factors with alphalense.
It gives as well a good plot of the single factor performance during 20 years.
Interestingly its performance goes consistently up and down, a bit like a drunk sinus curve.
But not very rapidly, very long frequency.
If one could follow the ups and downs and just select the factor if it’s near his top performance, it would be a great strategy.
Now the question would be how to jump at the right time into the right factors?
If I assume the optimum of the factors might be in play for a period between 1-2 years, it should be selectable by a RF.
In order to get that working, the decision tree needs enough sample to select from the noise the correct factors.
Now the point I was making before, I only can base the performance of the factor on very few new arriving samples (4 per year).
Is there a chance that this could work at all, or is the time (available new datapoints) not enough and I’m just training on a to small sample size and just predict noise.
Analogy, if I want to measure a very turbulent flow, I can compute based on the estimated rms how much sample I need in order to achieve a certain confidence level that I want to have. If I want to measure the mean velocity with x% accuracy, I need at least y samples if the flow has z rms.
Now this should somehow apply as well for a RF.
Would be good to how one could estimate that.
Best
Carsten