Spearman's Rank Correlation as a metric for automated algorithms. I apologize ahead of time for my poor programming skills


TL;DR: The rank performance test will invariably give an extremely high score for Spearman’s Rank Correlation on the order of .99 at times But individual stocks to their ranked returns for the week give more realistic R^2 scores of the order of 0.05. This metric is probably more useful for comparison of systems. Below is how to get that metric.

There was some interest in Spearman’s Rank Correlation in the Forum: Ranking your ranking systems - #15 by feldy However, the rank performance test will invariably give an extremely high Spearman’s Rank Correlation using the method in this thread I linked to.

This measures Bucket Rank to a ranking of aggregated performance. And the Correlations tend to be extremely high!! These extreme correlations are related to the aggregation that occurs in the rank performance test. The aggregation reduces the noise you will experience from rebalance to rebalance in real life.

For machine learning what you may want at times is the correlation of the “rank of the tickers each week” to the performance of those individual ticker, that week. For DataMiner downloads one can address this with the code below. The ticker ranks may be present in the download and you may only use this for the performance column. But you may find other uses:

Change the performance to a rank with this:

import pandas as pd

# Sample loading of data from CSV

#df = pd.read_csv('your_file.csv')

# Convert the date strings to datetime objects

df['Date'] = pd.to_datetime(df['Date'])

# Extract the week number for each date (using ISO week numbers)

df['week'] = df['Date'].dt.isocalendar().week

# Rank the true returns within each week and then convert to percentiles

df['ranked_returns'] = df.groupby('week')['Future 1wkRet'].apply(lambda x: x.rank(pct=True) * 100)

Get the Spearman’s rank correlation with this:

from scipy.stats import spearmanr

# Compute Spearman's Rank correlation for the specified columns

corr, p_value = spearmanr(df['100% rank'], df['ranked_returns'])

print(f"Spearman's Rank correlation: {corr:.4f}")

print(f"P-value: {p_value:.4f}")