Trim and Outlier findings/experiences

I haven’t looked at this recently, but in the past when I tried to determine whether outliers were a problem, I didn’t find any benefit from trimming, Winsorizing, or using Huber loss—at least not for tree-based models.

What did seem to help was increasing the minimum leaf size. Larger minimum leaf sizes seem to dilute the effect of outliers, and that may be why I’ve never seen outlier handling make a difference: the larger leaf size might already be managing the issue.

For linear regression it may be different. The impact of an outlier depends heavily on its leverage—where it occurs relative to the other data points. That makes generalizations about outliers in linear models more difficult. I am not as sure about linear models.

I recently saw this in another post. Whycliffes writing about Andreas’ settings I believe That is a pretty large number for Winsorizing: Ranking vs machine-learning algorithms - #41 by Whycliffes

2 Likes