For ML Feature Selection, Can a Larger MSE Actually Mean a Better Model?

I'm shocked to hear sklearn defaults to R^2 for regression. I would never have guessed that.

The loss function is a crucial part of the entire process, hopefully they'll give you folks an option for that soon.

For more on the perils of R^2, I highly recommend starting at p. 180 here: The Truth About Linear Regression

R^2 does not measure goodness of fit... R^2 can be arbitrarily low when the model is completey correct... R^2 can be arbitrarily close to 1 when the model is totally wrong... R^2 is also pretty useless as a measure of predictability...

For predicting returns, I tend to use spearman_correlation(y, y_hat), which I picked up from a couple papers. This is because I don't really care about estimating the return precisely, all I'm really interested in is ranking. (Note that while Pearson correlation and R^2 are related, this is not the case for Spearman correlation.)

In my experience -- and yours may vary -- optimizing for Spearman correlation has worked well. There are other ways to perform ranked ML, too.

2 Likes