I have a few bits of feedback after using the new rank performance tool.

I’m still used to looking at total returns, but the tool defaults to excess returns over benchmark, so each time, I have to do an extra click to switch to total returns. Can the user set a preference in their profile to select the default returns? Given the number of rank performance tests we run, this could save a lot of clicks.

The raw bucket data used to be nicely exposed at the bottom of the rank performance chart. In the new version, it’s much more hidden in favor of the regression data which is front and center. To expose the raw data, you either have to click the copy button and paste into Excel, or you have to mouse over each bucket one-by-one. It would be incredibly useful to have an option on the Annualized Returns by Quartile Chart to toggle between the regression data and the raw data. For example, if you select raw data, maybe hide the regression line and label the top/bottom buckets on the chart and replace the “Return Statistics” with the table of per-bucket annualized returns.

Scroll down to the bottom to see an analysis on number of stocks per bucket.

The cause is that two thirds of this ranking system requires 3 years of price data. As so many new companies appeared in that universe on 2021-06-05, the number of stocks with the highest number of N/A values caused the lowest rank to end up in bucket 2.

It looks like the Average Number of Stocks in Buckets chart has its data points misaligned by a period. We will look into resolving this issue next week.

I’m unclear if I should start a new thread or ask the questions here so I’m just posting here:

Can someone explain what we seek in the Spearman/Pearson Rank Correlation Coefficient?

I understand what they do according to ChatGPT; however, I don’t understand why they are used. e.g., Is a coefficient of 1 good? are we aiming for high coefficients and why?

What is a good/bad Spearman/Pearson Rank Correlation Coefficient with a short system? Do the same rules apply as the long system?

I probably missed it in previous threads, but I didn’t see it.

The spearman rank correlation is just a measure of whether a higher rank tends to have a higher rank of annualized return. In other words, does each subsequent bucket show a monotonic increase in returns than the prior bucket? If so, you should see a spearman correlation of 1, which is generally a good thing.

It’s just one metric that I use to evaluate of a factor or ranking system, but it doesn’t capture anything about the magnitude of changes from one bucket to the next. The pearson correlation is one way to measure that, as is fitting a regression to the x=rank and y=return line and measuring the slope of that which is also now shown in the new rank performance implementation.

As I mentioned in this comment in the “Ranking your ranking systems” thread, I tend to find the spearman correlation most useful when I’m running a rank performance on single factors, and I tend to focus more on measures of the return of the top bucket or top couple buckets when evaluating ranking systems.

My short ranking systems have a spearman correlation of 0.93-0.97 when evaluating at the granularity of 20 buckets.

We tried many things until we settled on this time series of compounded period returns. This was the best we came up to represent the stability through time. Every other attempt just looked like a bunch of noise.

All the data needed to produce this page is in the Download. This includes stddev for each bucket rebalance that we have not used at all. We encourage you to propose your own design (a picture helps a ton) that we can incorporate in the next revision.

Maybe not a graphics consideration but you might want to consider this especially if your implementation of ML uses r2_score as a metric. This is the default metric for most Python models and many of your ML models are non-linear (e.g., Boosting and Random Forests): Exploring a Third Metric for Stock Ranking Systems (Sklearn's r2_score)

I just realize RS performance has changed, but the result are no clear to me. First of all there is no explanations and second one the performance return is not in line with the previous results.
Take one of my best RS here attached…the best bucket are 1 and 2 with United State (Primary) while with Universe Easy to Trade US result are in line with logical expectations.

For charting consistency of results over time, I think that two other options are much more useful:
(a) the existing chart in rolling test results
or
(b) a relative growth chart

Also, I would look for ways to minimize clutter (such as by hiding setting options) and showing the most important stats without scrolling.

In the new version the Minimum Price defaults to 0 (it was 3 in the prev. version), which is fine with more liquid universes. With universes that include penny stocks the results are meaningless. Try setting the Minimum Price to $1 or more.

As far as the reasoning for this change: we didn’t want to set a minimum price to an arbitrary value. In addition it’s a post ranking filter which leads to unbalanced quantiles. We want the user to be aware of every nuance.

Now we just need some sort of display that highlights the problem. Let us know if you have suggestions. Try downloading the data. You should see many outliers, volatile returns, etc.

If you download the data using the universe United States (Primary) with Minimum Price set to 0 you will see that the average volatility of Bucket 1 is 10x higher than Bucket 20 due to penny stocks.

Perhaps all we need to alert you of the problem is to show a warning when the average volatilities of the H & L buckets exceeds 2x.

I wonder what this rank performance test would look like with z-score normalization of the ranking? Hmm, or even of the returns? For sure, nice to have z-score in the download data (thank you P123).

Note: you could no longer guarantee that each bucket would be the same size (for rank normalization). But instead you could construct it as a bell curve . I.e., top bucket is p-value < 0.05

Honestly I have not played with z-score much to have an opinion. But I may be downloading z-score data as early as today. Maybe others have already download the data and have an opinion on this. I do not. I just find z-score interesting and POTENTIALLY useful at this point.

EDIT: @marco graphics recommendation. If you colored the buckets differently when the bucket’s RETURNS has a z-score (z-score normalization of the return) that gives a p-value < 0.05 (normalized to the entire data set) it would look pretty and be informative. Maybe impress a visitor or two to the site. If attractive to P123, relatively easy when you include more statistics in the next release?

Maybe shade the bar based on the Z-score of the median- or mean-value for returns of that bucket. We now have the returns but do not know their Z-score—the shading would tell us that without having to look at anything else. So it would be truly informative.

Would it be making it too for users? After all someone might do well selecting ranking system (or factors) with “a dark top bar.” And maybe a positive Spearman’s rank correlation. Or maybe a slope recommendation cutoff from the forum for advanced users? No statistics knowledge whatsoever required to do well at P123? Hmmmm…yep, too easy but maybe marketable.

I would also consider being able to toggle on or off 95% confidence intervals (or even 99%) for the bars. If the lower bar is above the returns for the universe it suggests statistical significance. Or above zero for excess returns. Again, easier than anyone (including me) deserves. For example this rule might do well: “I will select ranking systems or features with dark top bars where the inferior 99% confidence interval bar is above the benchmark and the Spearman’s Rank correlation is positive,”

So purely an idea that may or may not use useful. I am suggesting (for consideration) shading the BARS and have a legend for theZ-score of the normalized returns. Akin to what ChatGPT did with my idea:

Ok Marco tanks, with minimum price set to >1 $ things are better.
I downloaded the data, I understand every row is a week, and no more else. I grasp the meaning of every column (return, std dev,turnover, n. of stocks) for the various quantile, but frankly I do not understand the meaning of them…what’s the added value I can squeeze from them?
RS I think is the clue of PF123 and I think something more friendly would help. PF123 is improving continuously, but it is even getting more complicated and that I think this goes opposite the way to improve the stalls of new incomers, and finally your business.
[rank_perf (1).csv|attachment](upload://3lslcct70P3ioGCBsckXOX, 37ERy.csv) (242.2 KB)

Thanks for the feedback. I agree, but the old version was hiding too many potential issues. We need to work harder to make the tool to do the best thing possible “out-of-box”, and not expect the user to do all sorts of backflips to figure out if the results are meaningful.

We need something better that just price

Just having the default “price > 3” (post rank) was a simple, quick hack. The “3” was the smallest number that seemed to produce stable results, but it was not researched in any depth:

it can produce unbalanced buckets

in Europe we have currency issues

in Canada & Europe stocks generally trade at lower prices

And check HLTT below. In 2006 the price data was jumping around between $2.20 and $1000. So with a “price > 2” you’d still have major issues. (sure it’s bad data and we should report it, but there are many, many examples like this)

With a price filter > 2 here’s what happens to the number of stocks in the buckets. The average is ~230 but some buckets have 85, others 300+. It’s probably ok since 85 stocks is still a good number; but at what point is this volatility not ok?

Probably the simplest and robust solution is to create a new filter that combines price and volume, and default it to eliminate stocks that trade less than $50,000 daily. It would still be a post-ranking filter, so we’ll need some sort of warning when the number of stocks in each bucket is too unbalanced.

Let us know you thoughts.

NOTE: this is a very timely discussion for our AI/ML project. In ML the weights are being altered by the algorithms, so data cleansing is paramount.

We were just discussing the optimizer. The plan is to incorporate the optimizer as an additional tab of the ranking system (instead of a separate tool, how silly who did that!), and add major statistics/charts for entire period and the two halves.

Is that all you’d need? Not sure much more is needed. The first test should always be a single rank performance backtest to catch major issues. The optimizer is for adjusting weights.