Collinearity: You may still want to remove some factors with any method you use

All,

I am guilty of collinearity. I admit it. While it has no civil or criminal penalties that I am aware of, I still think it is a serious problem that I need to address.

For example, I find Yuval to be a fantastic feature engineer. Just awesome. Full stop. We are fortunate to have him helping P123 and his willingness to share some of his personal features in the forum. I mean that and I use many of his feature every single day.

For example i use his unlevered free cash flow and did not remove FCF from my features. Maybe a bit of a guess here but I think I might be able to prove that there is some correlation (collinearity) between these features. And this is just one example.

@duckruck had a great idea for addressing collinearity with PCA. You should look at that, IMHO.

But simply removing one of the collinear features is another method for mitigating collinearity issues AND random forests are supposed to be less susceptible to collinearity. So is removing features still helpful with random forest?

Question: Despite the effectiveness of random forests for addressing collinearity issues could using recursive feature elimination (RFECV, one of Sklearn’s libraries) to remove some collinear features from a random forests be helpful?

In 3 words: I think so. And I have some evidence. Here is the result of RFECV (@yuvaltaylor it kept unlevered free cash flow while discarding FCF validating your feature choice, BTW). 1 is for a feature that is kept (other numbers removed). I will not include details of the factors as you will want to use your own factors (I could not get all of the feature in one screen shot but it keep 17/29 features). AND THE R2 SCORE ON CROSS-VALIDATION IMPROVED SUBSTANTIALLY:

Seriously, recursive feature elimination will help all members (beyond any issues of collinearity). Oh, and again, thank you @yuvaltaylor for your feature engineering and the many other things you do.

Jim

1 Like

Duckruck,

No I have not yet. Thank you for the idea! I will give it a close look.

Jim