I second the addition of CRYPTO.

It is now such a huge market and will keep growing.

For diversification one cannot ignore Crypto anymore.

Werner

**Save and load externally trained ml model from pickle, joblib file**.

- the model inputs are your custom formulas listed in appropriate order
- the model outputs are historical or current predictions (your RS values)

This could be essentially a complex version of custom formula associated with a pickle file, where users need to list names of factors (X matrix) in appropriate order. Then whenever this ML custom formula is needed (RS calculation), the pickle file is parsed and predictions are computed.

Provide historical ETF constituent holdings.

Also allow subscription to analyst (Zacks, S&P, etc) buy/sell ratings (including historic), as was done for ValueLine in the past.

Iâm having great success with analyst derived universes. It may be the easiest way to get to 60% annualized returns.

Walter,

what does that mean? Can you explain?

Thank you.

I import analyst buy recommendations as Stock Factors. The factors are then used in a Universe definition.

Thank you Walter.

I imported 20 years of Zacks rating, but in simulations did not got any beter alpha on my strategies. What Analyst Recommendations are you downloading?

Thanks

sraby,

I would be able to give a statistical analysis of this using a Naive Bayes Classifier assuming you have a good target. Because it is easy. And I would share with the forum with your permission. You can email me if you want to make other arrangements.

Full stop.

Jim

**Addendum**

**TL;DR: A Naive Bayes Classifier is trivial in Sklearn and it may not be a bad way to test this.**

I asked ChatGPT this question:

What statistical analysis would you do for this. Suppose you had a download of a large number of stocks with Zackâs Rank 1 - 5 and wanted to see if this predicted returns. I am not sure whether the returns would be next weekâs returns but maybe assume that. What statistical test would be the first one to try. I might try Naive Bayes with one Feature (Zacks Rank) to start. Comments and your best idea?

Answer:

When exploring the predictive power of Zacks Rank on future stock returns, the goal is to determine if thereâs a statistical relationship between the rank (as an independent variable) and the subsequent returns (as the dependent variable). Given the ordinal nature of Zacks Rank (1 through 5, with 1 being Strong Buy and 5 being Strong Sell), and assuming youâre interested in the continuous outcome of returns (next weekâs returns, in your case), here are some statistical approaches you might consider:

### 1. **Correlation Analysis**

Before diving into complex models, a simple correlation analysis could provide initial insights into the relationship between Zacks Rank and future returns. However, since Zacks Rank is ordinal, using Spearmanâs rank correlation coefficient is more appropriate than Pearsonâs correlation coefficient. This can help determine if thereâs a monotonic relationship between the ranks and returns.

### 2. **Ordinal Logistic Regression**

Given the ordinal nature of the Zacks Rank, ordinal logistic regression (also known as ordered logit regression) can be a suitable choice. This type of regression is designed for situations where the independent variable is ordinal. It can help you assess how changes in the Zacks Rank relate to the odds of achieving different levels of stock returns, assuming you categorize the returns into ordered classes (e.g., high, medium, low returns).

### 3. **Naive Bayes Classifier**

Starting with a Naive Bayes classifier with one feature (Zacks Rank) to predict the categorical outcome of next weekâs returns (e.g., categorizing returns into âincreaseâ, âno changeâ, or âdecreaseâ) is a good idea for a baseline model. Naive Bayes is straightforward and can handle the task of classification based on the probabilistic relationship between the feature and the outcome. However, it assumes independence between features, which might be a limitation if you plan to add more predictors later on.

### 4. **Analysis of Variance (ANOVA)**

If you categorize the continuous outcome of returns into groups based on the Zacks Rank, ANOVA can test if there are statistically significant differences in returns among the different rank groups. This can provide evidence of whether the rank has an overall effect on returns. However, ANOVA does not account for the order of the ranks.

### 5. **Machine Learning Approaches**

If youâre open to more complex models and potentially adding more features beyond Zacks Rank, machine learning approaches like Random Forests, Gradient Boosting Machines (e.g., XGBoost), or neural networks could be considered. These models can handle non-linear relationships and interactions between multiple predictors, offering potentially higher predictive accuracy.

### Best Practice

**Start Simple**: Begin with simpler statistical tests and models to establish a baseline understanding of the relationship. This includes correlation analysis and possibly ordinal logistic regression.**Model Complexity**: Move to more complex models as needed, based on initial findings and the addition of more predictors.**Validation**: Ensure to validate the model using a hold-out sample or cross-validation to assess its predictive performance genuinely.**Feature Engineering**: Consider transforming Zacks Rank into dummy variables if using models that donât inherently handle ordinal data.

Overall, the choice of statistical test or model largely depends on the nature of your outcome variable (continuous vs. categorical) and whether youâre looking at the problem from a classification or prediction standpoint. Starting with a Naive Bayes classifier is a solid approach for classification purposes, while correlation analysis and ordinal logistic regression could be more suitable for understanding directional relationships and predictive modeling.

Iâm not ready to share the source, but hereâs a link to the current csv. Data starts from Mayâ22. This copy will not be maintained.

Use this to build up a Universe for simulation.

BTW, I looked to share the Stock Factors series directly but I didnât see a way to make it public.

Thanks for the approach. I will run some tests.

Thank you for sharing. I will take a look. I see that is a â1â or a â0â, as in 1 = Strong Buy, and 0 = Strong Sell?

1=strong buy, 0=not a strong buy is closer to what happens.

If Zacks rank is from 1 to 5 (best to worst), something like ranks 1, 2 go into the universe; rank 3, 4, 5 are removed from the universe.

I do it that way b/c Fidelity limits the number of rows in a download to 500.

Walter,

I have looked at Zacks Rank in paper trading. I also looked at Fidelityâs rankings which as you know are based on multiple recommendation including Zacksâ recommendationsâweighting the recommendations according to their accuracy in the past.

You are probably aware that Fidelity has a high error rate on Zacksâ dataâtoo high I thought. I ended up paying Zacks for downloads directly from them to make sure I had accurate up-to-date data. I did not think there was a limit with Zacks either.

You probably have looked at this but if you have not you might consider taking a sample of Fidelityâs data and check it for accuracy at the Zacks web site.

Jim

Hi Jim,

I use to subscribe to Zacks and found the service less than compelling. I donât use Zacks ranking in my analyst universe.

Walter, when you did subscribe to Zacks, do you recall if you had access for historical Zacks rankings to download or just the current rankings?

No historical.

My Zackâs subscription was long ago and prior to p123 stock factor support. My most recent attempt with Zacks was via Fidelity. I guess there may be some data issues with that. Dunno really.

Since Fidelity doesnât support historical rankings, I need to be picky about what I choose to download. It would be too burdensome to d/l everything.

Hi Jim,

Thanks for the heads up. I think Iâve settled on an analyst for now. I may revisit Zacks later this years.

Are you using the Starmine ratings? I guess you can only forward test these in combination with a P123 ranking system?

âStarmine from Refinitivâ to be complete. I think @pitmaster, PhD. with and at least a Masterâs degree in machine learning employed by bank to do machine learning **used to be employed by Refiitiv**. If that is the case, I wonder if he would want to expand on this offering at Fidelity.

I traded all tenâs (checking to see if the Zacksâ scores were accurate for a while). Or actually putting the Zacksâ scores into an enlistâ**excluding the 10s (very Bullish) with errors on the Zacksâ score in the buy rules**:

**BTW, @WalterW I got around the 500 download limitâgetting up to 1,000âby sorting A â Z downloading 500 then sorting Z â A and downloading 500 and then munging the 2 downloads. You could probably get more with some other tricks but 1,000 was usually the complete list for me.**