Thanks @Jrinne, your oos results are brilliant.
I would like to contribute to the discussion how to more trust your ml model (reduce variance).
For p123 users who uses tree-based model, I would recommend to use ‘Monotonic Constraints’ ( monotonic_cst - has been added to scikit recently). This constraint ‘enforces’ positive relationship between a future and target during learning process.
Below you can see chart of a decision tree with simulated data where, I imposed positive constraint on ROE and negative on Accruals, then the model would use this guidance and use ROE as a first split (most important). If I would impose positive constraint on Accruals then this feature would not be selected at all in this toy example.
This is of course simplification. Nice explanation can be found here: https://xgboost.readthedocs.io/en/lat est/tutorials/monotonic.html. You can use this feature using Random Forest or XGBoost. Be careful with imposing monotonicity constraints to all features (e.g., sales growth).
Constraints: ROE positive, Accruals negative, can be read as if ROE <= 50, and Accruals <= 15 then average return within two samples = -10%.
Constraints: ROE positive, Accruals positive: