Early Stopping


TL;DR: I like what many are doing at P123. I have never been critical of the method, and in fact, I think it is at the core of all machine learning methods. That does not mean it would be impossible to improve the method.

P123 has a great method that a lot of people are using. I truly like it. I call it gradient descent. I think that is what it really is in machine learning terms. I mean optimizing, sometimes with the use a spreadsheet that randomizes the weights.

Coupled with early stopping it would be a complete machine learning method that could be put along linear regressions, support vector machines, random forests and XGBoost as AI/ML offerings at P123.

It could be used for marketing to machine learners if nothing else.

But think it likely that some of the people already using this method might find that early stopping helps with out-of-sample results. Machine learners understand “early stopping” and its benefits.

And maybe it would make people less likely to use the label of machine learing/not machine learning. Maybe understand the early stopping, cross-validation and regularization can be incorporated into any rational method.

So I am not a perfect poster and me trying to explain early stopping might not be the best thing. I will say that WILL BE USED at P123 for XGBoost and neural nets.

It is a method for reducing overfitting.

So basically, you optimize in one universe using the optimizer (call this training) as you normally would and then you test the rank weights on another universe (call this testing). You continue to optimize on the training universe AS LONG AS THE PERFORMANCE ON THE TESTING UNIVERSE INMPORVES.

By definition, OVERFITTING AND OVERTRAINING are where you keep fitting until you are overfitting. Early stopping just puts an early stop to overfitting. Hence the name early stopping.

Wikipedia does not do a bad job of this. It includes gradient descent in the discussion, which again, is what I think many are already doing: Early stopping

With early stopping I would use this myself—right along with support vector machines and look at the results to see which performs best.

Anyway, my real point is that by supporting machine learning I am not trying to compare it to what others are already doing. I do think the excellent method could be improved with early stopping.

With early stopping it might even be the best method. Maybe I would not end up needing support vector machines.

Summary: Sorry if I make seem like a competition at times. But the Kaggle crowd would recognize this a valid machine learning method. It might make sense from a business perspective and I think it would not be more resource intensive than a neural net. P123 will already be incorportting early stopping into their platform with XGBoost and neural nets anyway.

I am mainly recommending that you discuss this with the person doing AI/ML and see if she thinks this might be a marketing tool and whether it might even be useful. Maybe it will make people begin to use some previously unfamiliar methods everywhere and develop a consensus over what is generally useful at P123 (e.g., early stopping might gain wide acceptance).