First impressions with new Beta

First try I left in Future 3 Month returns like I was used to doing for my offline runs which required me to separate the features and target after downloading.
For this system you select the target and then the features. Better but differet.
Well I managed to fumble the second try, poor results but I discovered I had somehow downloaded SP500 rather than S&P Mid caps. Don’t know how this happened.

3rd try, 20yr data using S&P MidCaps, Time Series CV 8 fold split 2yr hold, 1yr gap and 3 to 17 year training periods.
Model rankings; Random forest 100, Linear ridge 100, extra trees ii 100, xgb ii 40, gam ii 20.
But actual portfolio gains don’t correlate with the ranks. Every single one gave nice excess profits over the 16 years of portfolio validations. This is from 3/2008 including most of the ’08 bear market.
Top 5% of stocks in universe. Picked SPMidCap as I thought it would be a better test than
Extra Trees 19 stks 7.35 above BM
Rand Forest 20 stks 5.5 above
XGboost 20 stks 10.37
LinearRidg 20 stks 7.58
GAM 19 stks 9.58%

Unless I’m missing something, it appears that there is not yet an Ensemble’s Prediction method using the results of selected models built in. A simple majority vote should improve the results.
Aurelien Geron in “Hands On Machine Learning w/ Sciket-Learn” has pointed out that in an noisy environment an ensemble of week learners often give stronger results. RF, ET and XG boost are internally ensemble’s
of trees but Geron points out that “Ensemble methods work best when the predictors are as independent from one another as possible”. “One way is to train them using very different algorithms”. SVM and GAV are very different from decision trees.

The next big question is. What is the turnover for a portfolio build on these predictions? Would love to do a simulation with a select on prediction, hold till prediction drops below some threshold.

Currently I see you can download all the predictions for each model into excel and build your own voting or even a weighted voting system in excel.

For a Beta ML product this one very nice piece of work. A few minor tweaks and anyone can use it regardless of their ML knowledge.

Biggest initial problems are the of the human interface with the software. How do I enter data, how do I change it after I notice too many missing values or too much noise, how do I select models . . . just the normal things I always run into when I first try a new package.

For the record I’m not an AI or ML professional, I did work in the field almost 30 years ago but my expertise was in designing computers and IC’s for ML not the software. Since then, I have tried to keep with the field by on line courses, a collection of books along with using AI/ML for my personal investment screening and backtest software.

You could now try creating an AI factor for each model and average the ranks for each factor in a P123 ranking system. While not what you may normally think of as "model averaging," I think this fits the definition and you could see what it does now.

Honestly however, I have been disappointed by model-averaging AI models with stock factors, in general.

I've been playing with it all day and I hope to have more useful feedback coming, but just wanted to congrats Marco and the staff. It's a really cool tool and even just dipping my toes into it and datamining like crazy in a short time I've found it very interesting and thought provoking.

I am very impressed. My results are improving.

Lift off!!!:

Not upset with this. Although I am a little concerned about the second half results, this is a "first impression" after all:

Intuitive, advanced knowledge on the part of all developers evident IMHO.

Really good.

Jim

1 Like

I too remain very optimistic with this. For just a second day beta system a very large portion of a final user-friendly system is evolving. The comments have been requests for simple feature changes not of the major parts of the ML models or results. Somehow when I was intending a Midcap universe, I downloaded a S&P500 universe. Was initially upset with lower than expected but later realized the model was actually making a decent profit in the most difficult universe. Still want to know what the turnover & friction for a portfolio will be.

So I trained until 1/1/2019 and put that into a ranking system called AI. The sim is after that: starting 1/1/2019 with that ranking system.

We can do that now or did I make a mistake?

BTW, out-of-sample 31 CAGR is looking good for some excellent Designer Models. Did @Marco succeeded in reducing overfitting to some degree or at least give us a method that gives better expectations for the out-of-sample results than a simple backtest would?

Jim

1 Like