AI PRE LAUNCH: Feedback Roundup

marco · July 18, 2024, 1:53pm

Dear All,

First of all, thank you for your great feedback so far! As the formal launch nears we wanted to make sure we are not missing anything major from your suggestions. Below is a list of enhancements that are in the pipeline already, some have already started. I used polls in this post so you can let us know which you prioritize.

Let us know what features we missed with a short description, perhaps with a link to the original post describing it. Also anything interface related that would make your life easier.

Thanks!

Easy

Increase limit for features to 2000 and folds to 20.
Add LightGBM. Any others?
More predefined models
Many, many more predefined models!
Give an error if a predictor is used in a backtest that uses training data

0 voters

Medium

Include turnover statistics to portfolio results
Better hyper-parameter and AI documentation
Categorical features support
Macro features support
Add support for constraints
Classification target
Add more GPU hardware
Add more CPU hardware

0 voters

Hard

Add GridSearch capability (easily create 100 hyperparam combinations)
Add "point in time" predictor for backtests to replicate the validation portfolio automatically (multiple predictors are created, one for each fold)
Clone support for random algorithms: the same model is cloned multiple times and the average is used in the results.
Tools for feature engineering: feature importance, feature impact, permutation feature importance
Feature correlation to help trim out correlated features
Macro target for timing the market
Industry/Sector features like in ranking systems
Support for testing different validations within the same AI Factor (currently you need to make a copy of the AI Factor)

0 voters

New

Support for ensembles of models within an AI factor

0 voters

AlgoMan · July 18, 2024, 2:43pm

I have noticed that I pretty much use the same set of Validation models each time I'm testing a new AI Factor. If there was a quick way to chose a set of the favourite models/algos it would save some times when working.

bobmc · July 18, 2024, 7:05pm

Marco; 20 folds might be helpful in minimizing time between train and test but won’t 2000 features pretty much bog down everything?

One unmentioned possible feature, what about returns for an ensemble return of best 3 to 5 models.

But overall, I think your AI toolsets are progressing very nicely. By now you have a better perception of how individuals are using the tools. Riccardo’s AI Factors User Guide is a good start especially for users with no previous introduction to ML.

ZGWZ · July 18, 2024, 10:41pm

Would you give those predefined features selected by dan in his test to users?

marco · July 19, 2024, 2:53pm

To be clear: an ensemble inside an AI Factor would train each model of the ensemble at each fold, then use the average predictions to generate the results. Is that what you want? The (big?) limitation is that it's an ensemble of models that use the same dataset. Does that make it useless? Is it still worth doing? Creating a whole separate "Ensemble AI Factor" that can use any model using different datasets is a huge project.

The alternative for now, once we release the "point in time" predictors (almost ready btw), is to create a ranking system of your best models as the ensemble. Then you can run a backtest and see how the combination works out. This would support creating ensembles of models using either the same dataset or different datasets.

The downside of the ranking system ensemble, is that it will be pretty slow to run, and somewhat complicated to setup: you have to set up point in time predictors, bring them all together into a ranking system, then run a backtest. This of course after you've done the validation research of individual models.

But as a proof of concept it should be a good start. We can then see if it's worthwhile doing a better integration of a new "ensemble" component.

bobmc · July 19, 2024, 4:12pm

Marco: “To be clear: an ensemble inside an AI Factor would train each model of the ensemble at each fold, then use the average predictions to generate the results. Is that what you want? . . .The downside of the ranking system ensemble, is that it will be pretty slow to run, and somewhat complicated to setup: you have to set up.”

I was envisioning a much simpler approach. Your current system runs through all the folds and ranks the models performance across all the folds. At this point during the testing option one could have the option of doing an ensemble performance of the top few of the models that have performed the best during the validation folds. A simple summation of the top ranks from each of the models would give a consensus (ensemble) rank. I’ve only sampled this method with one simulation, but the performance was better than the individual results.

marco · July 19, 2024, 4:32pm

Right now, nothing is stored in validations except the results. No ranks, no predictions, and no trained predictors (the "executables"). We would need to make some changes to store the predictions so they could be used by an ensemble. This would increase storage requirements considerably, but that cost is passed on to the user in form of RU's. So yeah, it could all work. Still about a week's worth of work. Thanks

marco · July 19, 2024, 4:36pm

I added a new poll for ensembles at the end , called "New". Please cast your support vote. Thanks

danp · July 19, 2024, 7:39pm

Which test are you referring to?

ZGWZ · July 19, 2024, 10:36pm

danp · July 20, 2024, 1:30pm

I'm sorry but I am not going to share the feature list because that model was not a 'test' - it is my personal model and has money invested in it. I can say that it has 75 features that are mostly growth, profitability and value related. Most are quarterly or TTM.

Most of the features are either in the Predefined Feature list or are quarterly versions of features in that list.

For some of the features, I use a 'weighted' TTM formula so that the more recent quarters get higher weight. Like this:
((EBITDA(0,QTR)*4) + (EBITDA(1,QTR)*3) + (EBITDA(2,QTR)*2) + (EBITDA(3,QTR)*1)) / EV

The thinking was that I wanted to use EBITDAQ/EV and EBITDATTM/EV but then those 2 might be highly correlated which is not ideal for some models. So it seemed better to create a single feature that covered both. I didnt do any tests to see if that actually gave better results then just using the Q and TTM features, but I still like the idea of putting more weight on the more recent quarters.

bobmc · July 20, 2024, 3:52pm

Dan, It’s encouraging to observe that at least one of the P123 employees that after observing the AI/ML results is already investing their real money with the new models rather than their screens/simulations.

Jrinne · July 20, 2024, 4:58pm

I agree correlation is important.

Tree models also allow for interactions. For example, is having a great quarter good if the the company has been doing well all year (TTM). Or is it better if the last quarter is a turn-around for a company that has had a tough time over the last year. The first would be correlated while the latter would not be correlated.

Maybe it is U-shaped where both extremes are good. No problem for a tree model to sort that out.

Like Dan I have not tried much of that to share specific cases.

Linear regression does not include interactions unless you specifically create interaction variables.

Jim

Jrinne · July 21, 2024, 2:51pm

I just want to add that this is at least one important aspect of @pitmaster's RealFit. Using decision tress over random forest tree models allowing for interpretability.

I leave it to him to expand on the uses of RealFit.

Claude 3 has additional methods to create interactive variable that one can explore on their own. But several methods discussed are available thru P123's machine learning were mentioned. My point being P123 is advanced (even in the beta implementation) and already provides several methods for doing this.

I keep finding K-nearest neighbors (KNN) useful everywhere I look and every time I try it BTW..

@Marco, I might suggest looking into how resource intestine KNN is. It requires a lot of memory but more than saving all of the validations for an ML model? Maybe not more based on my back of the envelope calculation.

@pitmaster uses decision trees a lot and I intend to use them more. For day-to-day decision making if nothing else. I am not sure how they would fit into P123's ML but they could be used in conjunction with a random forest to get an idea of where splits are being made if nothing else.

A decision trees could also be used as a factor in P123 classic and that would duplicate what RealFit does within P123 classic. This would be a nice way to include more complex factor interactions while keeping the interpretability of P123 Classic (and decision trees).

I still actually use P123 classic at the end of the day. I should probably explore different ways (including the ones above) to add interactive variables to what I do with P123 Classic. This is probably Dan's point (and not originally mine), BTW.

Jim

marco · July 22, 2024, 6:05pm

Thanks , we'll try to add it soon together with LightGBM

feldy · September 7, 2024, 8:57pm

Related to this feature, it would be tremendously helpful to be able to re-use a dataset for testing different target variables, e.g. different target horizons, or absolute vs relative returns at the same horizon, etc.

Given that there can be a 99.5+% overlap between these datasets except for the target variable, it's wasteful to have to create a new AI factor with its own dataset, especially given the storage and runtime costs of building the dataset.

If you refactored the dataset concept to be independent of the AI model, the user could create one that contains several possible target variables, one of which could be selected during creation of the AI model. This design would allow also allow you easily support the above feature -- testing different validation configurations easily off the same dataset.