ML workflow

Hi everyone – I’ve been working on-and-off over the last three years to find, buy, or otherwise set up a system to create and test ML trading based mainly on fundamentals data, in Python. That is, without spending $500+ per month. I have paid for at least five different APIs/services, tried modifying open source stuff on GitHub and importing the SF1 dataset – etc. etc.

I rejected all of these solutions for one reason or another. But I found Portfolio123 from Yuval Taylor’s blog. I’ve been reading the documentation over the last week and it does sound like it might fit my needs. But I would like to go over my specific P123+ML workflow and make sure it is possible or reasonable. Hopefully, going over mine could help others with similar issues in the future.

I realize that expensive licenses are required to download raw data – but that most of transformed data is available for download. Most of what I want are percentages, standardized, etc., so I think I can get by bulk downloading those.

Proposed workflow for training a single factor model:

  1. Set universe to S&P500 (say)

  2. Download dependent variable for all contemporary members of universe:

    • 4 quarter return % above benchmark
    • Frequency: quarterly
    • Date range: Jan. 1, 2010 - Dec. 31, 2024
  3. Download independent variable for all contemporary members of universe:

    • % change in profitability compared to 4 quarters ago (for example)
    • Frequency: quarterly
    • Date range: Jan. 1, 2010 - Dec. 31, 2024
  4. Merge (2) and (3) into one dataframe

  5. Lag the independent variable, so that I predict one quarter out

  6. Set up cross-validation, hyperparameter tuning, and model training

    • Ommitting the details – but suffice it to say, I understand how tricky and complicated this step is
  7. Train the model

  8. Evaluate the cross validated performance with the usual stats/ML methods

  9. If happy with the model, use it to create rankings to import into the P123 interface:

    • ‘Assign’ every quarter a version of the model trained on the data previous to that quarter
    • For every quarter, use its associated model to rank the universe
  10. At this point, for every quarter in the date range, I have ranked the entire universe – upload all of this into P123

  11. Within P123 interface(s), run screening/simulations/etc., using the uploaded ranking to obtain practical performance results

Please let me know if this is possible, and any further thoughts you might have on this workflow.

Honestly, most of this is feasible until you get to step 9. There’s no current mechanism to have dynamic ranking system weights on p123. Some of that functionality may be coming with the forthcoming ML p123 features, but last I heard from Marco they weren’t going to support dynamic model weights if you were not using their built-in ML model estimation.

Thank you for you reply.

So at step 8, I wouldn’t be uploading any model weights. My idea is a CSV like:

quarter,ticker,rank
2001-04-01,AAPL,1
2001-04-01,MSFT,2
...
2024-01-01,IBM,2133
2024-01-01,IBX,2134

where rank is derived from the model score. P123 never ‘knows’ where I am getting the ranking from.

Currently you’d have to do it after the download, or with formulas for your independent variables (features). For example if your independent variables are SalesGr%Q and Ret1Y%Chg you probably just want to lag SalesGr%Q. You can do that with this formula to lag it 4 weeks:

FHist(“SalesGr%Q”,4)

The tricky bit is for ratios like Pr2SalesTTM when you only want to lag Sales. You would have to write the ratio yourself using FHist() in the denominator

Let us know specifically what kind of lagging you want to accomplish since we plan to add support for it.

Thanks

1 Like

So you could upload these rankings as a custom stock factor, and then seamlessly integrate into a strategy by creating a ranking system consisting of a single node of that custom factor.

Current row limitations for custom stock factors are relatively tight so you may want to PoC that to make sure you don’t hit a wall.

If you do, you could import those ranking CSVs into some other platform for back testing and/or live trading.

Thanks – I suppose you wouldn’t need to upload every stock’s ranking. Just the top N you want to for your portfolio. For 10 years of quarterly ranks, with a “top 50” portfolio strategy, that would only require 10 * 4 * 50 = 2000 rows to upload.

deleted by author (meant for a different thread)

I have two suggestions you may want to be mindful of.

  1. There’s currently no provision for automatically NaNing out an imported factor if it hasn’t been updated within a certain time period. So, you’ll have to update ranks for the previous “top 50” at the next update which will at most double your row estimate depending on overlap.

  2. Think about whether you really want to just have quarterly rank updates. Even if your factors are factors from quarterly statements, the updates to those factors will update throughout the quarter depending on when each company announces earnings and how long it takes for FactSet to process that update. You may also have some value factors that compare something from the income/cashflow statements or balance sheet to some measure of price. Those factors will update more frequently than just quarterly. In other words, even if you only plan to re-estimate model weights each quarter, you may want to update your rank forecasts more frequently, depending on the update frequency of your features and how often you plan to make trading decisions.

Thanks for the replies. I think I’m sufficiently informed to put my CC in and give this a shot! I’ll try to post a status report here later.

Great point. If there’s not automatic zero-ing out, is there a manual one?

EDIT: The documentation says there is an option to ‘delete all existing data’ before uploading an up-dated series. I assume that takes care of this issue: Imported Data Series & Stock Factors - Help Center

Regarding what frequency is best for AI…

Here’s an excerpt from a recent AI paper mentioned in this post Stock picking with machine learning

Almost all of the abovementioned studies use ML models for monthly predictions based on monthly data. In contrast, we analyze shorter term predictability and focus on weekly data and weekly predictions. Analyzing weekly predictions provides two major advantages: First, the larger number of predictions and trades in an associated trading strategy provides higher statistical evidence due to the larger sample size. Second, ML modes require large training sets. Therefore, studies analyzing monthly predictions require very long training sets of at least 10 years. Given the dynamics of financial markets and the changing correlations in financial data over time, it could be suboptimal to train ML models on very old data, which is not any more relevant for today’s financial world due to changing market conditions. Because our study builds on weekly data, we are able to reduce the length of the training set to only 3 years while still having enough observations for training complex ML models.

3 Likes

Weekly update makes sense, especially if you’re including technical indicators like the authors. Really cool paper

1 Like

I think the paper is probably correct on this. Noting the referenced study is about classifications models and that it can probably be generalized to regression models. We generally use regression models at P123 and first iteration of AI/ML will be using regression models exclusively if I understand correctly. BTW, I think it was a wise choice to start with regression models and I am not sure that classification models should be a priory… I am not questioning the decision to do that as I think it was a good decision.

I would add to this paper the evidence from classic P123 users. P123 users commonly rebalance weekly. And P123 is kind-of-like a non-parametric (i.e., uses ranks which are ordinal) multivariate regression with manual optimization. Manual optimization rather than using something like gradient descent (internally at P123 anyway). If P123 classic used automated optimization it would clearly be be a machine learning model.

Actually, not “kind-of.” It is, in fact, a non-parametric regression model that is generally optimized manually by members. But a P123 model can be optimized with machine learning. For example, one cloud determine the weights in a ranking system using a regression model. Correct me if this is not factually correct.

But whatever people want to call what they do with classic P123, whether they self-identify as using statistical learning, fundamental analysis or whatever, P123 members’ manual optimizations are not likely to be any different than those resulting from gradient descent optimizations that we will be using with machine learning at P123.

And P123 is moving toward improved automation of P123 classic—considering allowing members to do some of the optimizations in parallel for example. I am not sure at which point everyone agrees that it is at least a little-bit-like machine learning or when P123 finally automates the entire process making it machine learning by definition.

This is just to say P123 classic has been (and is) valuable. Personally, I don’t mind having some of that automated with grading descent algorithms. I am only making the point that I believe much of what has been found to be true with P123 classic will continue to be true with automation.

TL;DR: Most P123 classic members have been using weekly rebalance for a while now and I don’t expect optimization with fully automated machine learning (i.e., gradient descent) to change that in any way. It would be surprising if it did.

Jim

2 Likes

I agree with @benhorvath that for weekly predictions some considerations needs to be made in terms of selected features and in addition in terms of a label.

One important property of the dataset processed by the ML algorithm should be the consistency of persistence between features and labels. Intuitively, the autocorrelation between the label y (future return) and the features X should not be too distant. If you sample features (accounting-based) and labels at weekly basis, the label is very weakly autocorrelated, while the features are often highly autocorrelated.

An extreme example is a model with one feature: SalesTTM / SalesPTM and you label is 1-week ahead return, you sample data weekly during 6 months. This means that your features change only once during 6-months training period, while your labels will be changing every week. Autocorrelation in feature space will be close to 1, while autocorrelation of label is usually close to 0. In this case your model captures will a lot of noise.

There are some solution to this problem but is probably topic for other thread.

3 Likes

This seems very important since most datasets will be a mixed bag of short term and long term factors, with target maybe somewhere in between.

What are some of the solutions?

Thanks

2 Likes

@Pitmaster, Thank you. I had not thought of feature autocorrelation until your post and I don’t know much about the problems of feature autocorrelation. I do know an embargo can be used for autocorrelation when doing k-fold validation but I had not considered whether it helps with both feature and target autocorrelations.

When you answer Marco you might keep in mind that he has mentioned k-fold validation in the past and discuss embargoes with him if you think that can be a partial solution for feature autocorrelation from k-fold cross-validation—especially if P123 will be using it. P123’s AI/ML is still a bit of a black-box and I don’t know if it will include k-fold cross-validation. My question may not be pertinent but I think P123 would benefit from discussing this with you. An embargo may not be that difficult of a programming challenge for P123 if you think it would be helpful and P123 is considering using k-fold cross-validation is the only reason I mention it.

BTW, I assume something like EBITDA/EV also has autocorrelation even if the price can change this metric from day-to-day. I assume you were giving an extreme example above.

Jim

1 Like

Am I missing something with my logic: These feature rightly are not predictive of weekly returns for the reason you mention; the label changes but the feature doesn’t. There is nothing to solve for this as it’s not a problem - it’s a discovery of it’s usefulness for the task (i.e. predicting one week returns).

I think the most widely accepted, in academia and practice, transposition for accounting metrics is to normalize these features with price: (SalesTTM / SalesPTM) / close(0) = SalesTTM / (SalesPTM * close(0)) ?

Possible solutions are as follows:

  • increase label return period - for example consider annual or biennial future returns when working with 4-weeks data or use - this should increase the autocorrelation of the label.
  • use differences in factors levels, for example instead of SalesTTM/EV, use SalesTTM/EV - FHist(SalesTTM/EV, 1) - this should decrease autocorrelation of the features.

Personally I never use weekly returns as labels. They are too noisy especially for small stocks. It is also worth considering a path-dependent labelling techniques, like the triple-barrier or trend-scanning developed by Lopez de Prado.

3 Likes

Isn’t it better to do an ensemble ? Train three AI factors for short, medium and long term features. Target could be the same or different I suppose. Then combine for a single prediction.

For the medium and long term factor you can do weekly, but just include in the dataset of stocks that have changes (i.e. have reported).

Let’s summ it up:

  • to reduce noise in the label we need longer time periods
  • now as we have low noise in the label we need longer periods in the feature
  • as we have longer features periods (which improves the signal/noise as well) we need longer testing periods.
  • as I have now longer training periods, like 10year plus, we will tend to have less flexible weights
  • as the weights are less flexible, I can go back to classic portfolio123 with fixed weights

Sorry, I’m a bit exaggerating :wink:

Ok, basically we need to increase signal to nose ratio.

What options do we have?

  • excess return (cross sectional measure)
  • daily ranking (as well cross sectional measure)
  • are there are some other cross sectional measures?
  • filters, like Kalman, did somebody tried it?

If one goes into the time series direction on might pick up some noise, as market conditions might change.
I would prefer to be able to follow changes in market conditions on a 3 month base. This should fit as well with changes in sector performance.
Or running only with price information, way more samples and shorter than changes in market conditions.

This stuff is not easy…

My most frustrating experience was the first time I did an autocorrelation on a price series…nothing, zero - then I understood what random movements mean’s