AI Improvement

Hi Craig,

I may be misunderstanding what you’re doing, but it sounds like you’re using rolling/sliding 5-year windows to retrain so you capture the most recent feature importances.

If that’s the case, your results don’t surprise me. Personally, I’ve even given up on 10-year training windows — short samples make models very sensitive to regime shifts and outliers — 2018 and 2020 with severe feature inversion being a prime example. You could be capturing 2 of the worst periods for feature inversion in one small 5-year window.

One thing to keep in mind: the larger the training window, the less any one outlier can move the results. Going from 5 years to 15 years, you can also scale parameters like min_child_samples upward (e.g., 70 → 210 or more), which further reduces sensitivity. It would take a huge outlier to dominate at that point.

Doesn’t P123 apply a fairly conservative trim on the returns (target) by default? If so, then it seems the main “outliers” here are actually market regimes (with a short rolling window) rather than return (target) spikes (if you are using those defaults for z-score normalization). Rank normalization of the returns removes the problem of return outliers completely if you are using that.

For return outliers specifically, I really like @SZ’s suggestion of trying MAE instead of MSE. For some models that can slow things down quite a bit though — not sure why MAE tends to run so much slower for me.

In short, it seems possible the main “outliers” here are actually market regimes (with a short rolling window) rather than return (target) spikes. And if you’re using P123’s default z-score normalization or the rank normalization for the target these steps effectively remove the problem of return outliers even with your present child size, I think.

I hope this helps and that I didn’t go off on a tangent by misunderstanding your window size.

4 Likes

By the sound of his wording he used all availble data up to 2019/2020 for the final Prediction training Now I go to train the model up to 2020 and backtest it , so doing rolling traning does not matter in that case.

It's hard to know what has gone wrong with so little details, but since he is showing backtested ranking with 100 buckets, I assume that he backtested the strategy with very few stocks aiming for the top percentiles chasing outliers, that is a hit or miss approach... probably has nothing to do with the actual AI model.

2 Likes

Thank you very much for all the help!

Regarding what @Jrinne asked, it is as @AlgoMan stated I was using 5-years for the testing buckets for validation. For the test phase I used up to 2019, then 5 years "out of sample" to somewhat mimic Portfolio123's 5-year limitation in the simulation and to see how the AI Factors prediction for the top quantile bucket compared to simulation.

For these tests I was using a very broad Universe (~3000 stocks of all market capitalizations) so the top quantile bucket of 100 approximates the simulation holdings of 30ish.

That is a great point about chasing outliers as @AlgoMan states which could quite impact the results, with ML models would it maybe be better to test 50 or 100 holdings? I am using sell rules to reduce turnover to between 300-400% despite the model showing ~2,000% in AI Factors.

Thanks for the tips. I am using the default z-score normalizations. I am testing with MAE and higher child samples now - and you're right, quite slow!

2 Likes

What do you make of this? Highest ranking for my in the AI Factors dashboard.

Pretty impressive!

The "catch" is the above is only trained to 2014.

This is the same Random Forest model trained to 2019:

Just the same sort of outlier luck?

--

All the simulations are using average next day and variable slippage.

1 Like

Looks good so far. I will take my commission via paypal :laughing: why not test it for the same years for a clearer comparison?

Keep in mind if you use MAE you might need less min child sample size than with MSE

1 Like

They are interesting comparisons. It is difficult to really know without the details such as the features or hyperparameters but really all you have shown are high numbers so that is always good to see. Definitely beats negative ones. MAE could help you see how impactful outliers were or weren’t. I would try MAE with a smaller min child sample than 70. Looking forward to see your results!