Two non correlated factors have great success in building a ranking system, namely price momentum and price to sales. It may be the case that there are other pairs or groups of factors with low mutual correlations that have potentiated results. Any ideas of what they could be and any ideas on how to search for factors with low correlations that could exponentially increase returns?

A recent study attempted to classify hundreds of factors according to their degree of non-correlation. See my review of it here: https://blog.portfolio123.com/thoughts-on-is-there-a-replication-crisis-in-finance/

If you want to study the correlation of factors, here’s what I suggest. Create a one-factor ranking system. Run the performance test using no more than 10 buckets (5 might be easier), but click on “performance” rather than “annualized returns,” and don’t use the default universe, but a universe with reasonable liquidity. Download the result.

Let’s say you’re using ten buckets. You’ll get ten columns of numbers. Create a second ten columns with the formula =C14/C13-1 and so on so that each column collects the percentage returns from the corresponding column. Let’s say those are columns N through W.

Now create a row somewhere with the numbers 0.1, 0.2, 0.3 . . . up to 1.0. Let’s say you create it in row 12, cells N12 through W12.

Lastly, create yet another column with the following formula: =SLOPE(N14:W14,N$12:W$12) and copy that down the row. That will be the slope of the ten bucket return during every month of your backtest.

Perform the performance test on another factor, and then another, and another, and each time paste the results into the first file you created. Save the slope columns in a separate spreadsheet and label them all. Then create a correlation table. You can do this using the “Data analysis” tool in the Data tab of your Excel file.

I think you’ll find this a very useful tool to discovering factors that are correlated or uncorrelated in terms of their performance. Once you’ve set up the initial Excel file, it’s not terribly difficult. I’ve attached an Excel file I created using one factor with the slope column clearly labeled, just in case my directions were unclear. Columns A through L consist simply of the output of my performance test.

slope through time.xlsx (58 KB)

When I read the question, my thought that ‘correlation’ would be something like the correlation of the 1 month returns from a rank performance test for the top bucket for each factor. Like this:

Create a ranking system with 1 factor.

Run the Rank Performance test with these settings → 10 years (or whatever period you prefer), rebalance every 4 weeks, use the same universe you plan to use in your live trading system, 5 buckets, Chart Type = Performance. I did 5 buckets because I will only keeping the data from the top bucket and the top 20% just seems like a good cutoff to me.

Run this for each factor or formula that you want to include in the test.

I think a limitation to this approach is that it only makes sense to use factors that have decent returns in the top bucket because using factors that had bad returns would give you the desired low correlation to the factors with high returns, but why would you want to include factors that generally do poorly? Trying to find pairs of factors that complement each other and increase returns is a different exercise.

Once you have the data from the top bucket for each factor, put it in a spreadsheet and calculate the returns for each month. Then create a correlation matrix which calculates the correlation of those returns for each factor vs the other factors. I did the test for 4 factors and the correlations are shown below. As richardnfrank mentioned, the correlation between momentum and value factors is fairly low.

```
EarnYield Ret6M%Chg
Ret6M%Chg 78.67%
OpIncGr%PYQ 95.01% 90.68%
Pr2SalesTTM 96.67% 78.37%
```

Gathering this data for a large number of factors would be time consuming. Users with coding skills could use the rank_perf endpoint in the API which has a outputType = Performance. Just read in a list of the hundreds of factors you want to test and run the single factor test above for each one. If anybody wants to work on that, let me know and I can give you the Python code I use for another single factor test which will get the inputs (factors) from Excel, run the test and write the results to Excel - but you have to modify the part that writes to Excel because my script is for the output type = annualized returns which is a different format. I might work on this at some point, but I am too busy right now.

Correlation.xlsx (69 KB)

All,

Both Dan and Yuval have great ideas. Perfect really. I don’t think a mathematician at a University would find anything to disagree with. Certainly I do not.

But none of this is new and some shortcuts have been developed over the years. A compact way of doing the same thing would be to use factor analysis or Principle Component Analysis (PCA). There might even be some additional things factor analysis could do other than determine the correlation (e.g., use the eigenvalues that are generated).

PCA would put factors together creating a set of factors (with weights) that are completely uncorrelated with another set of factors. For factor analysis, the latent variables are not completely orthogonal (roughly meaning not completely uncorrelated). I have had better luck with factor analysis however for somewhat complex reasons involving “latent variables.”

The only real difference in relation to what Yuval and Dan are suggesting is that with factor analysis and PCA factors are combined to form a “vector.” Non-correlated vectors (or combination of weighted factors) are then found–combining factors and streamlining the entire process. Eigenvalues–while possibly useful–are more about significance of the factors and avoiding overfitting.

The book “Fortune’s Formula” describes the successful use of factor analysis for investing–although he does not spend much time on it. The book is for the lay-public and it is probably too mathematical for many the way it is.

The quote is exactly correct I think, and as suggested in the rest of Dan’s post this kind of thing can be automated. Dan says he might automate some of this at some time in the future. This also falls in the purview of machine learning and P123 has an AI specialist who may want automate correlation methods at some point (e.g., factor analysis and PCA).

TL;DR. Yuval and Dan have some great points and you wouldn’t go wrong doing some combination of what they are suggesting. The basis for their methods is well established (has been since the first papers on this in the 1930s). Their method(s) has the advantage that it (they) can probably be tied into the present ranking systems and ports at P123 (as can factor analysis). We may see some automation of this method at P123 in the future but it is doable now.

Jim

Interesting discussion. In addition to looking at the correlations between factor performances, I’m wondering if anyone has studied the correlations between factor ranks. So for example, take the universe of stocks you’d actually invest in, get the ranks for those stocks under, say, both Price to Sales Q and 26 Week Price % Change, then calculate the correlation between the percentile ranks. The idea would be to find factors with decent returns but which don’t always tend to rank the same stocks/types of stocks highly.

In the past I’ve tried using the screener for this, and although the methodology I think is straightforward, it quickly becomes tedious and time consuming depending on how many factors and dates I want to check. I also run into the daily download limit.

Or maybe it’s not worth the effort?

All,

Above you have seen that people are interested in (and using) correlation of factor’s returns. For example, it appears that Yuval might have used it and is perhaps even suggesting that some members use it.

Ethan (e_hyche) asks about correlation of factor ranks. This too is an interesting topic, I think. I would only be able to add that this is how factor analysis and principal component analysis is classically done. It is correctly described as “unsupervised” machine learning. It is frequent used for such things a “dimensionality reduction.” This sounds esoteric but would probably be a good method for construction nodes and assigning weights within the nodes to the factors at P123. It is outside of this discussion for me to try to predict whether Ethan’s method or Yuval’s or Dan’s method would give markedly different results. But they are all worth considering and actively pursuing (as Yuval had done), I think.

In summary, I like Yuval’s and Dan’s simplifying method (assumptions) that allow for the use of Excel downloads of P123’s rank performance. Ethan’s, idea has great merit also. Perhaps, one would need the API to implement Ethan’s method.

To be compete in this discussion about correlation, I want to mention one other correlation metric. Correlation of the P123 rank to the rank of the sorted returns, presumably using Spearman’s rank correlation.

This is ultimately what we are interested in isn’t it? For sure there would be no need to investigate further quantitative methods after we found a ranking system that had a correlation of one (1) with the future returns. Uhhh…well, maybe a discussion of how much leverage we might want to use so we could get truly rich by this weekend. But we would otherwise be finished having found the holy grail. My point being it is an interesting metric and one that is frequently used already.

That having been said, the buckets provide a pretty good visual picture of the same thing.

Personally, I prefer Spearman’s rank correlation coefficient over the slope that Yuval suggests. But eyeballing the buckets, the slope and Spearman’s rank correlation are all great ways of looking at the same thing and perhaps they all give you about the same information in the end.

Yuval actually uses the slope metric in a spreadsheet, however, if I understood the above post correctly. So there may be a place for actually calculating some of these metrics.

Some of this (including the Spearman’s rank correlation metric) has been discussed in serious machine learning literature in the past. It would be simple to add this to the rank performance test. The slope could also be added if people like the idea (the idea that Yuval suggests above).

I appreciate Dan’s interest in automating some of this at P123. I also look forward to proposed changes in the forum where, perhaps, there could be a more extensive discussion of some of these topics without disrupting the flow of other discussions. And perhaps a moderator well informed in statistics, machine learning and reinforcement learning who might want to implement some the better ideas (were there ever to be a forum like that).

P123 is affiliated with Stanford? Hastie who specializes in machine learning at Stanford is BRILLIANT. I wonder if some of his graduate students might want to moderate some P123 discussions–assuming that P123’s AI specialist does not want to do it for some reason. There are probably other graduate schools that P123 might be able to make arrangements with.

I do not think the Kaggle crowd would be overly impressed with the idea of repeatedly reinventing the wheel on some of this with little hope that their ideas and insights would ever be implemented. This is an old discussion. P123 might consider ways of marketing to people interested in automating some of this–at low cost in the forum. Maybe use some techniques that have already been validated in peer reviewed journals.

For now, l suggest Yuval’s idea as a feature suggestion. That we put the slope as a metric in the rank performance test output. BTW, slope and correlation are equated with very simple formulas (if one is incorrect to use or a poor metric then so is the other). Except for the fact the Spearman’s rank correlation (vs the usual Pearson’s rank correlation) is a non-parametric method using ranks (no need to assume the data has a normal distribution or is linear). But I like slope (and any kind of correlation). Being for slope and against Spearman’s rank correlation (or vice versa) would not make much sense.