Random forests are the simplest things coders could ever look at

Jrinne · August 16, 2023, 3:19pm

Sam,

Thank you for your interest.

Just as an aside ElasticNet will give you weights that could possible be entered directly as weights for the factors (or features) in P123’s ranking system. There is a potential easy-of-use benefit there.

But that was not your question. I have limited data and the data I have been using has a potential look-ahead bias as Walter pointed out to me just this morning in this post: Data download the day of rebalance for machine learning - #6 by WalterW

I have done random forests before, but in retrospect my factor choice was poor then and I think you are better of if we ignore those results even if I remembered them correctly.

That being said the most direct short answer to you question is that with the limited and flawed data that I have: Random forests are doing better.

I did a grid cross-validation with these parameters ; param_grid = {
‘max_features’: [0.3, ‘sqrt’, ‘log2’],
‘min_samples_leaf’: [3000, 10000, 20000]
}

I got this output: Best Parameters: {‘max_features’: 0.3, ‘min_samples_leaf’: 3000}. I had tried smaller ‘min_samples_leaf’ in previous grid searches.

So, use some big’ min_samples_leaf’ numbers in your grid search is one piece of advice I have. Hope that helps.

I hope that helps some and just like you I will need more data for any reliable conclusions.

Jim