Interesting Paper:
The findings generally support (with some exceptions) things I have observed:
- Expanding windows are usually the best choice.
- Regression methods outperform classification methods.
- Using excess returns relative to the "Market" as the target is crucial.
Regarding the last point, how do they define the "market"? Specifically, is it equal-weighted or cap-weighted?
I believe it is cap-weighted, and I wonder if this is one reason their findings differ somewhat from what many of us at P123 observe—namely, that adding micro-caps is not beneficial (in the paper but not in our experience).
The use of cap-weighting makes the impact of micro-caps on their definition of "market" negligible. This likely means there is considerably more noise in the excess returns of micro-caps, especially to the extent that their returns are not correlated with the cap-weighted "market."
While not proven, for P123 members who invest in micro-caps, using equal weighting for the universe is probably a better approach. In practice, this suggests that excess returns relative to cap-weighted benchmarks like the Russell 2000 or Russell 3000 may not be ideal for micro-cap or even all-cap strategies. At P123, even all-cap strategies tend to include many micro-caps and small-caps for various reasons, further emphasizing the potential limitations of cap-weighted benchmarks.
Summary:
Use relative strength as Target (3- 12-month, agree if you emphasize total return with about market risk)
use nonlinear ML models (agree, with Rank as a preprocessor)
use long(er) time frames to validate and test (agree)
Trend, Momentum, Beta, Analyst Earnings Revisions are most important factors (in this study --> agree on estimates revisions and momo)
"In contrast, post-publication adjustments, feature selection, and training sample size have minimal impact on the outperformance of non-linear models. These findings indicate that more complex machine-learning models require larger training datasets to robustly capture non-linearities and interactions in the data."
very interesting: This is pure gold, it means even after factor publication, non-lin MLs can squeze out alpha + they are good at selecting the most important factors...
agree --> best results in my AI models --> long training periods (with basic holdout 2004 - 2019 and then prediction training from 2014 - 2019, then OOS test with the predictor 2019 - today).
Other findings (in general):
Small and micro caps -->
LightGBM does best (does well with sparce data!) + one needs to restrict the universe by selecting micro to small caps
Findings on big caps
Extratrees I to III does best
Best Features --> Small and Micro Cap Focus (does very well on small and big caps), Core: Sentiment (well on mid to big caps).
Using predictors that have been trained on one universe and then use it on another --> results are not good
Jrinne --> I agree on the small caps, in academic studies they "must" cap weight (just standard for a reason, they want to produce somewhat scalable models).
But then one does not capture the small size and low liquidity effect, which can be used for smaller accounts.