Data download the day of rebalance for machine learning

Jrinne · August 22, 2023, 10:08am

Here is some academic support of this featue request. It would be nice to rebalance a model, with updated data, that is trained using this: A Gentle Introduction to k-fold Cross-Validation

Here is what would then be available thru Sklearn: K-Folds cross-validator

Notice Sklearn’s use of random_state that would replace mod() and has more functionality. Also using shuffle = True and Shuffle = False has different uses covering many previously describe at P123 (e.g., Shuffle = True is similar to even/odd universe with more options and Shuffle = False allows for selecting different time-periods).

Sklearn allows for optimization with different metrics and you can code you own metrics if Sklearn does not have a metric you like. And automate this optimization.

This is something only available thru uploads and downloads of csv files to and from Python now. And Yuval had stress the usefulness of dividing data into discrete universes himself. Something i agree with. So there seems to be wide support for making this available thru with spreadsheets at least. Perhaps it would not be wrong to make this available thru multiple different means, including thru Python.

Python does make this easier, better and adds additional functionality compared to a spreadsheet using mod(). And I can run it while I sleep.

Bootstrapping, subsampling, recursive feature elimination, model averaging as well as may other features might also be useful to some P123 members.