S&P 500 stock selection using machine learning classifiers: A lookinto the changing role of factors

Depends who you ask. Once you settle on a model (in my example "P123 extra trees II") you then have to train it. My k-fold was using ~14y for training and ~5y for holdout, and my dataset is 20y. So what do you use for your "live" model?

Below are the predictions of the same model trained using 15y and 20y. Not a single one matches in the top 10. Also the range of values in the prediction differs a lot, although I don't think I care about that since I would just rank the predictions.

EDIT there was a bug, see follow up.

Isn't financial data great?

1 Like