ML workflow

marco · February 29, 2024, 11:18pm

Regarding what frequency is best for AI…

Here’s an excerpt from a recent AI paper mentioned in this post Stock picking with machine learning

Almost all of the abovementioned studies use ML models for monthly predictions based on monthly data. In contrast, we analyze shorter term predictability and focus on weekly data and weekly predictions. Analyzing weekly predictions provides two major advantages: First, the larger number of predictions and trades in an associated trading strategy provides higher statistical evidence due to the larger sample size. Second, ML modes require large training sets. Therefore, studies analyzing monthly predictions require very long training sets of at least 10 years. Given the dynamics of financial markets and the changing correlations in financial data over time, it could be suboptimal to train ML models on very old data, which is not any more relevant for today’s financial world due to changing market conditions. Because our study builds on weekly data, we are able to reduce the length of the training set to only 3 years while still having enough observations for training complex ML models.

benhorvath · March 1, 2024, 12:35am

Weekly update makes sense, especially if you’re including technical indicators like the authors. Really cool paper

Jrinne · March 1, 2024, 10:30am

I think the paper is probably correct on this. Noting the referenced study is about classifications models and that it can probably be generalized to regression models. We generally use regression models at P123 and first iteration of AI/ML will be using regression models exclusively if I understand correctly. BTW, I think it was a wise choice to start with regression models and I am not sure that classification models should be a priory… I am not questioning the decision to do that as I think it was a good decision.

I would add to this paper the evidence from classic P123 users. P123 users commonly rebalance weekly. And P123 is kind-of-like a non-parametric (i.e., uses ranks which are ordinal) multivariate regression with manual optimization. Manual optimization rather than using something like gradient descent (internally at P123 anyway). If P123 classic used automated optimization it would clearly be be a machine learning model.

Actually, not “kind-of.” It is, in fact, a non-parametric regression model that is generally optimized manually by members. But a P123 model can be optimized with machine learning. For example, one cloud determine the weights in a ranking system using a regression model. Correct me if this is not factually correct.

But whatever people want to call what they do with classic P123, whether they self-identify as using statistical learning, fundamental analysis or whatever, P123 members’ manual optimizations are not likely to be any different than those resulting from gradient descent optimizations that we will be using with machine learning at P123.

And P123 is moving toward improved automation of P123 classic—considering allowing members to do some of the optimizations in parallel for example. I am not sure at which point everyone agrees that it is at least a little-bit-like machine learning or when P123 finally automates the entire process making it machine learning by definition.

This is just to say P123 classic has been (and is) valuable. Personally, I don’t mind having some of that automated with grading descent algorithms. I am only making the point that I believe much of what has been found to be true with P123 classic will continue to be true with automation.

TL;DR: Most P123 classic members have been using weekly rebalance for a while now and I don’t expect optimization with fully automated machine learning (i.e., gradient descent) to change that in any way. It would be surprising if it did.

Jim

pitmaster · March 1, 2024, 2:18pm

I agree with @benhorvath that for weekly predictions some considerations needs to be made in terms of selected features and in addition in terms of a label.

One important property of the dataset processed by the ML algorithm should be the consistency of persistence between features and labels. Intuitively, the autocorrelation between the label y (future return) and the features X should not be too distant. If you sample features (accounting-based) and labels at weekly basis, the label is very weakly autocorrelated, while the features are often highly autocorrelated.

An extreme example is a model with one feature: SalesTTM / SalesPTM and you label is 1-week ahead return, you sample data weekly during 6 months. This means that your features change only once during 6-months training period, while your labels will be changing every week. Autocorrelation in feature space will be close to 1, while autocorrelation of label is usually close to 0. In this case your model captures will a lot of noise.

There are some solution to this problem but is probably topic for other thread.

marco · March 1, 2024, 3:01pm

This seems very important since most datasets will be a mixed bag of short term and long term factors, with target maybe somewhere in between.

What are some of the solutions?

Thanks

Jrinne · March 1, 2024, 4:01pm

@Pitmaster, Thank you. I had not thought of feature autocorrelation until your post and I don’t know much about the problems of feature autocorrelation. I do know an embargo can be used for autocorrelation when doing k-fold validation but I had not considered whether it helps with both feature and target autocorrelations.

When you answer Marco you might keep in mind that he has mentioned k-fold validation in the past and discuss embargoes with him if you think that can be a partial solution for feature autocorrelation from k-fold cross-validation—especially if P123 will be using it. P123’s AI/ML is still a bit of a black-box and I don’t know if it will include k-fold cross-validation. My question may not be pertinent but I think P123 would benefit from discussing this with you. An embargo may not be that difficult of a programming challenge for P123 if you think it would be helpful and P123 is considering using k-fold cross-validation is the only reason I mention it.

BTW, I assume something like EBITDA/EV also has autocorrelation even if the price can change this metric from day-to-day. I assume you were giving an extreme example above.

Jim

korr123 · March 1, 2024, 4:42pm

Am I missing something with my logic: These feature rightly are not predictive of weekly returns for the reason you mention; the label changes but the feature doesn’t. There is nothing to solve for this as it’s not a problem - it’s a discovery of it’s usefulness for the task (i.e. predicting one week returns).

I think the most widely accepted, in academia and practice, transposition for accounting metrics is to normalize these features with price: (SalesTTM / SalesPTM) / close(0) = SalesTTM / (SalesPTM * close(0)) ?

pitmaster · March 1, 2024, 4:53pm

Possible solutions are as follows:

increase label return period - for example consider annual or biennial future returns when working with 4-weeks data or use - this should increase the autocorrelation of the label.
use differences in factors levels, for example instead of SalesTTM/EV, use SalesTTM/EV - FHist(SalesTTM/EV, 1) - this should decrease autocorrelation of the features.

Personally I never use weekly returns as labels. They are too noisy especially for small stocks. It is also worth considering a path-dependent labelling techniques, like the triple-barrier or trend-scanning developed by Lopez de Prado.

marco · March 1, 2024, 6:29pm

Isn’t it better to do an ensemble ? Train three AI factors for short, medium and long term features. Target could be the same or different I suppose. Then combine for a single prediction.

For the medium and long term factor you can do weekly, but just include in the dataset of stocks that have changes (i.e. have reported).

Carsten1234 · March 1, 2024, 8:34pm

Let’s summ it up:

to reduce noise in the label we need longer time periods
now as we have low noise in the label we need longer periods in the feature
as we have longer features periods (which improves the signal/noise as well) we need longer testing periods.
as I have now longer training periods, like 10year plus, we will tend to have less flexible weights
as the weights are less flexible, I can go back to classic portfolio123 with fixed weights

Sorry, I’m a bit exaggerating

Ok, basically we need to increase signal to nose ratio.

What options do we have?

excess return (cross sectional measure)
daily ranking (as well cross sectional measure)
are there are some other cross sectional measures?
filters, like Kalman, did somebody tried it?

If one goes into the time series direction on might pick up some noise, as market conditions might change.
I would prefer to be able to follow changes in market conditions on a 3 month base. This should fit as well with changes in sector performance.
Or running only with price information, way more samples and shorter than changes in market conditions.

This stuff is not easy…

Carsten1234 · March 1, 2024, 8:40pm

My most frustrating experience was the first time I did an autocorrelation on a price series…nothing, zero - then I understood what random movements mean’s

benhorvath · March 1, 2024, 8:53pm

The problem of autocorrelation is a subproblem of the larger issue of forecasting/predicting when samples are not independent. Most of the ‘easy’ or ‘classical’ algorithms assume that independence.

It’s certainly possible to get around this issue – though I don’t think ensembling by itself would get you there. For classical forecasting like ARIMA, differencing is the usual solution. For cross-sectional regression, you’d look into adding a random effect. If you were trying to use XGBoost, you may have to investigate other solutions as well.

Jrinne · March 1, 2024, 9:37pm

I think this is a serious concern that could extend to XGBoost but XGBoost would not be the worst model. First, it performs feature selection during the training which mitigates the problem to some extent by focusing on features that have the most predictive power.

Also regularization helps mitigate the problem. XGBoost has both L1 (LASSO) and L2 (Ridge) regression. But I don’t have enough experience to say there is not still a problem or that this solves everything.

@Marco there is concern about independence in the training process and not just in the predictions.

At the end of the day, cross-validation can provide an individual answer for members. Individual answers with regard to their factor choices and the models they use.

WITH MONOTONIC CONSTRAINTS and cross-sectional data Sklearn’s boosting model does fine (k-fold cross validation with an embargo easy to trade universe). My cross-validation would suggest boosting could be used with some cross-sectional features. No regularization used here (other than early stopping).

And again, I am not sure an embargo solves all of the problem of autocorrelation but it would be worse without it, I think (embargoed k-fold cross-validation):

Screenshot 2024-03-01 at 4.36.19 PM

Jim

benhorvath · March 1, 2024, 10:59pm

I would be very hesitant to use XGBoost or any other model on time-based data without some allowance made for violation of independence of samples. As you suggest, cross-validation will be key here. And probably a CV strategy more robust than simply holding out the same 20% of data and testing against that same data 100 times.

I suspect you’d find that your in-sample metrics look great, and the out-of-sample looks awful.

Carsten1234 · March 1, 2024, 11:23pm

Kindly Whycliffes posts some good selection of papers - thanks!

In a lot of them on can find that Random Forest comes out on top.

Is there some experience here that other techniques perform better (boosting, etc.)

Jrinne · March 1, 2024, 11:26pm

I like Extra Trees Regressors best of the ones commonly discussed. I have one that seems to do better (that is not commonly discussed). Support Vector machines run too slowly on my machine to test them and I have not done a neural-net recently or with good factors.

BTW, Extra Trees Regressor (like random forests) doesn’t not have a lot of difficult-to-understand hyperparameters to adjust. It is considered one of the easier machine learning methods to use. Random forests are slow with Extra Trees regressors being much faster.

Jrinne · March 1, 2024, 11:40pm

I do get there is an assumption of independence of the independent variable and that this assumption is clearly violated when there is autocorrelation. There can be no doubt of that. Other assumptions are violated too. There is the assumption of i.i.d. data (independent and identically distributed) as I am sure you already know. Market regimes are not identically distributed, I think. I agree. Assumptions are violated. No question.

benhorvath · March 2, 2024, 5:30am

Yes, i.i.d. is another way to say independence of sample. Actually, a smart way to test your cross-validation would be to train a model on unadjusted data and see if the hold out set(s) are much worse than the training set. (If they aren’t, the CV probably needs to be scrutinized.)

There are definitely ways to handle violation of i.i.d. It just adds a wrinkle – most of the simple tutorials you get by googling are no longer applicable.

As I say to clients all the time – ML is not magic! It’s hard! Much more thought and effort is needed than just from xgboost import xgbregressor.

Jrinne · March 2, 2024, 11:59am

TL;DR: Thank you Pitmaster for bring up the issue of feature autocorrelation. I had previously missed that de Prado was addressing feature correlation here:Advances in Financial Machine Learning with regard to cross-validation. I had been thinking he was talking about the target. But from the text: “Consider a serially correlated feature X…” I am not saying this is the final word and it is only about cross-validation at that. i think you may have broader concerns beyond cross-validation. Maybe my cross-validation methods have benefitted from this discussion.

So we can do some things to improve our models. Many things including addressing issues of autocorrelation. That is a given.

benhorvath says it well here:

I think he is right about that. A wise professor of medicine once said to me: “The reason that there are so many treatment options for this condition is they are all flawed. If there was a really good treatment we would all be using the same one.” Encouraging words indeed as I was about to start a treatment.

Maybe an analogy for the large number of methods published for predicting the market with machine learning. More simply: “It is hard” as benhorvath says.

So I have this question for myself mainly. Knowing I am a flawed individual using flawed methods with limited time to perfect my models is there a good method to test whether my model is reallly better than listening to Jim Cramer on CNBC before I fund it?

I note when I ask this that there is a method of cross-validation that uses Time Series Split for cross-validation at SKlearn.

It seems it is intended to address—at least in part–this problem: " Time series data is characterized by the correlation between observations that are near in time (autocorrelation )".

I leave it to someone smarter than me to discuss how well the people at Sklearn have done at addressing the problem of autocorrelation with this method. But I did use it before funding my present (not a random forest) model—knowing it would never be perfect. Used it because autocorrection can be a problem, I think. And will continue to look for ways to mitigate the problem in my models and cross-validation of those models.

Addendum (for geeks like me mainly):

De Prado discussed this in dept in the book: Advances in Financial Machine Learning

And excerpt from 7.3 WHY K-FOLD CV FAILS IN FINANCE

Leakage takes place when the training set contains information that also appears in the testing set. Consider a serially correlated feature X that is associated with labels Y that are formed on overlapping data: Because of the serial correlation, Xt ≈ Xt + 1.

de Prado, Marcos López. Advances in Financial Machine Learning (p. 195). Wiley. Kindle Edition. .

One just needs to understand:

Leakage is a bad thing and can be caused by autocorrelation in k-fold cross-validation
Serial correlation is the same thing as autocorrelation in this context.
An embargo used with k-fold cross-validation and also walk-forward cross-validation are proposed as solutions to the problem of autocorrelation in this book.
Bagging as is used in random forest was also proposed as a solution for classifiers at least (de Prado, Marcos López. Advances in Financial Machine Learning (p. 196). Wiley. Kindle Edition.)

Jim

bobmc · March 2, 2024, 8:48pm

Jrinne & benhorvath

Your both quoting de Prado frequently, just thought I’d mention if you already haven’t come across it.

In Github search for “de Prado’, there are several posts where people have replicated code for several of his book examples.

RAM