How The Medallion Fund sustained 66% p.a. for 30 years and generated $100 Billion

I second the addition of CRYPTO.
It is now such a huge market and will keep growing.
For diversification one cannot ignore Crypto anymore.

Save and load externally trained ml model from pickle, joblib file.

  • the model inputs are your custom formulas listed in appropriate order
  • the model outputs are historical or current predictions (your RS values)

This could be essentially a complex version of custom formula associated with a pickle file, where users need to list names of factors (X matrix) in appropriate order. Then whenever this ML custom formula is needed (RS calculation), the pickle file is parsed and predictions are computed.

Provide historical ETF constituent holdings.

Also allow subscription to analyst (Zacks, S&P, etc) buy/sell ratings (including historic), as was done for ValueLine in the past.

I’m having great success with analyst derived universes. It may be the easiest way to get to 60% annualized returns.

1 Like

what does that mean? Can you explain?
Thank you.

I import analyst buy recommendations as Stock Factors. The factors are then used in a Universe definition.

Thank you Walter.

I imported 20 years of Zacks rating, but in simulations did not got any beter alpha on my strategies. What Analyst Recommendations are you downloading?



I would be able to give a statistical analysis of this using a Naive Bayes Classifier assuming you have a good target. Because it is easy. And I would share with the forum with your permission. You can email me if you want to make other arrangements.

Full stop.



TL;DR: A Naive Bayes Classifier is trivial in Sklearn and it may not be a bad way to test this.

I asked ChatGPT this question:

What statistical analysis would you do for this. Suppose you had a download of a large number of stocks with Zack’s Rank 1 - 5 and wanted to see if this predicted returns. I am not sure whether the returns would be next week’s returns but maybe assume that. What statistical test would be the first one to try. I might try Naive Bayes with one Feature (Zacks Rank) to start. Comments and your best idea?


When exploring the predictive power of Zacks Rank on future stock returns, the goal is to determine if there’s a statistical relationship between the rank (as an independent variable) and the subsequent returns (as the dependent variable). Given the ordinal nature of Zacks Rank (1 through 5, with 1 being Strong Buy and 5 being Strong Sell), and assuming you’re interested in the continuous outcome of returns (next week’s returns, in your case), here are some statistical approaches you might consider:

1. Correlation Analysis

Before diving into complex models, a simple correlation analysis could provide initial insights into the relationship between Zacks Rank and future returns. However, since Zacks Rank is ordinal, using Spearman’s rank correlation coefficient is more appropriate than Pearson’s correlation coefficient. This can help determine if there’s a monotonic relationship between the ranks and returns.

2. Ordinal Logistic Regression

Given the ordinal nature of the Zacks Rank, ordinal logistic regression (also known as ordered logit regression) can be a suitable choice. This type of regression is designed for situations where the independent variable is ordinal. It can help you assess how changes in the Zacks Rank relate to the odds of achieving different levels of stock returns, assuming you categorize the returns into ordered classes (e.g., high, medium, low returns).

3. Naive Bayes Classifier

Starting with a Naive Bayes classifier with one feature (Zacks Rank) to predict the categorical outcome of next week’s returns (e.g., categorizing returns into “increase”, “no change”, or “decrease”) is a good idea for a baseline model. Naive Bayes is straightforward and can handle the task of classification based on the probabilistic relationship between the feature and the outcome. However, it assumes independence between features, which might be a limitation if you plan to add more predictors later on.

4. Analysis of Variance (ANOVA)

If you categorize the continuous outcome of returns into groups based on the Zacks Rank, ANOVA can test if there are statistically significant differences in returns among the different rank groups. This can provide evidence of whether the rank has an overall effect on returns. However, ANOVA does not account for the order of the ranks.

5. Machine Learning Approaches

If you’re open to more complex models and potentially adding more features beyond Zacks Rank, machine learning approaches like Random Forests, Gradient Boosting Machines (e.g., XGBoost), or neural networks could be considered. These models can handle non-linear relationships and interactions between multiple predictors, offering potentially higher predictive accuracy.

Best Practice

  • Start Simple: Begin with simpler statistical tests and models to establish a baseline understanding of the relationship. This includes correlation analysis and possibly ordinal logistic regression.
  • Model Complexity: Move to more complex models as needed, based on initial findings and the addition of more predictors.
  • Validation: Ensure to validate the model using a hold-out sample or cross-validation to assess its predictive performance genuinely.
  • Feature Engineering: Consider transforming Zacks Rank into dummy variables if using models that don’t inherently handle ordinal data.

Overall, the choice of statistical test or model largely depends on the nature of your outcome variable (continuous vs. categorical) and whether you’re looking at the problem from a classification or prediction standpoint. Starting with a Naive Bayes classifier is a solid approach for classification purposes, while correlation analysis and ordinal logistic regression could be more suitable for understanding directional relationships and predictive modeling.

I’m not ready to share the source, but here’s a link to the current csv. Data starts from May’22. This copy will not be maintained.

Use this to build up a Universe for simulation.

BTW, I looked to share the Stock Factors series directly but I didn’t see a way to make it public.

Thanks for the approach. I will run some tests.

Thank you for sharing. I will take a look. I see that is a “1” or a “0”, as in 1 = Strong Buy, and 0 = Strong Sell?

1=strong buy, 0=not a strong buy is closer to what happens.

If Zacks rank is from 1 to 5 (best to worst), something like ranks 1, 2 go into the universe; rank 3, 4, 5 are removed from the universe.

I do it that way b/c Fidelity limits the number of rows in a download to 500.


I have looked at Zacks Rank in paper trading. I also looked at Fidelity’s rankings which as you know are based on multiple recommendation including Zacks’ recommendations—weighting the recommendations according to their accuracy in the past.

You are probably aware that Fidelity has a high error rate on Zacks’ data—too high I thought. I ended up paying Zacks for downloads directly from them to make sure I had accurate up-to-date data. I did not think there was a limit with Zacks either.

You probably have looked at this but if you have not you might consider taking a sample of Fidelity’s data and check it for accuracy at the Zacks web site.


Hi Jim,
I use to subscribe to Zacks and found the service less than compelling. I don’t use Zacks ranking in my analyst universe.

Walter, when you did subscribe to Zacks, do you recall if you had access for historical Zacks rankings to download or just the current rankings?

No historical.

My Zack’s subscription was long ago and prior to p123 stock factor support. My most recent attempt with Zacks was via Fidelity. I guess there may be some data issues with that. Dunno really.

Since Fidelity doesn’t support historical rankings, I need to be picky about what I choose to download. It would be too burdensome to d/l everything.

Hi Jim,
Thanks for the heads up. I think I’ve settled on an analyst for now. I may revisit Zacks later this years.

Are you using the Starmine ratings? I guess you can only forward test these in combination with a P123 ranking system?

“Starmine from Refinitiv” to be complete. I think @pitmaster, PhD. with and at least a Master’s degree in machine learning employed by bank to do machine learning used to be employed by Refiitiv. If that is the case, I wonder if he would want to expand on this offering at Fidelity.

I traded all ten’s (checking to see if the Zacks’ scores were accurate for a while). Or actually putting the Zacks’ scores into an enlist—excluding the 10s (very Bullish) with errors on the Zacks’ score in the buy rules:

BTW, @WalterW I got around the 500 download limit—getting up to 1,000—by sorting A → Z downloading 500 then sorting Z → A and downloading 500 and then munging the 2 downloads. You could probably get more with some other tricks but 1,000 was usually the complete list for me.