Man+Machine – The evolution of fundamental scoring models and ML implications

pitmaster · January 3, 2024, 6:26pm

I have not done yet any research to claim which approach for training window would work best. But I would rather stick to a longer training window. Below you may find useful research done by Yuval:

You’re more likely to have factors work like they did in the past if you use a 10-year lookback period than a 1-year.

Some interesting avenue of reaseach would be for example:

use weighted sampling, two methods available:

undersample less important data (e.g., select 100% observations from FY2023, 90% from FY2022, etc.)
specify importance of data (similarly as in the previous point) as a classifier parameter, which will puts more emphasis on getting these points right (already implemented in some Scikit-learn & Keras classifiers)

dynamic training window, an algorithm would decide how many years of past data use to predict next period based on some smart metrics.