Bagging at P123 as an eventual addition to AI/ML

The term “bagging” in the context of machine learning was introduced by Leo Breiman in 1994.

That is awesomely cool. @marco when you add k-fold validation to the AI/ML why don’t you just bootstrap this rather than making members use Mod() numbers which can be limiting for the reasons Wycliff’s suggest above?

People are finding great ways to do what you will probably be automating at some point (if probably not in the first release).

And as you know bootstrapping (and the P123 implementation of mod()) can be its own method of validation (without k-fold). With k-fold (and saving the training results of each fold then averaging those result) it is more a method of model averaging as Yuval has said in his writings I believe. He can correct me if he has changed his use of the term ‘model averaging.’

And actually, bootstrapping with model averaging is called bagging (Bootstrap AGGregatING) something that machine learners would recognize and probably be attracted to. It is not unusual to bootstrap 100,000 results. Something that is now difficult for me with mod().

Member’s ideas are often great ideas that can be understood in terms of AI/ML and automated for users including Wycliff’s. He would not have to call it ML if he does not want to but I suspect he would use it as he basically already does. ML can automate many of the great things uses are already doing saving members time and possibly P123 computer resources if people are constantly doing this manually. I do not mean to set priorities for P123 or not express my gratitude for making me money the way it is when saying that.

The model I use is bootstrapped 100 times. It does not change much with greater than 100 bootstrapped samples but is more than I can do with mod(). Not a feature request as I can already do it with the present downloads. Just an idea for the AI/ML at some point. Probably not a top priority even if considered by P123 staff as Wycliff’s is happy using mod() while adjusting for turnover and apparently he can find enough different numbers in mod() to satisfy him and I am happy being able to use bootstrap 100 times (or 100,000 if I wanted) with the downloads. So not a priority perhaps and I do not mean to imply that it should be.

I do like to “bag” however and I am not afraid to give the originator of this idea credit (not my idea).

Jim