How high would the R^2 get do you think?
The common indicator is the correlations of long-short factor returns. Maybe you should read some papers
So in conclusion, SIR/IO is very loosely (and maybe not at all) related to short fees, and once again, it would be very helpful to have actual short fee data!
I don't have access to raw data for institutional ownership, but I computed the rank normalized SIR/IO (a proxy for the proxy, if you will) for the same period and the correlation to IBKR borrrow fees rises to 0.12.
If we get the FactSet short financing rates integrated into p123, I'll check again how well they correlate to IBKR fees.
Your lack of correlation may not be in a good way.
The real key is always "how much incremental information" and "the correlation of the factor returns", not "the correlation of the original values of the features". This is because the low correlation of the raw values of the features may be the result of so-called "noise", i.e., differences that do not lead to predictive targeting, whereas the new factors do not lead to much incremental explanatory power beyond that even if those factors have low correlation with the old factors.
As shown in the figures, it is true that there is a lack of correlation of raw features between borrowing costs and SIR/IO when borrowing costs are moderate. This is also accompanied by a lack of monotonicity in the relationship between borrowing costs and returns there.
Whereas the relationship between SIR/IO and returns seems to be more monotonic at the long side. The two cover different time periods, though, so they may not be very comparable.
The correlation between SIR and borrowing costs in another paper is 0.2 and your correlation is lower than that.
And it looks like the advantage of the cost of borrowing indicator over SIR/IO is mainly at the short side, but this is offset by higher borrowing costs by definition.
Where the cost of borrowing data would be very useful is in forming profitable shorting strategies from it, rather than as a factor on its own, but the former has been talked about previously. My point is always that "There is a lots of signal in short lending fees as far as I can tell" is wrong, especially when you have short interest features already
Here is also a paper that may interest you.
The coefficient on the option-implied
lending fee remains highly significant even when other variables such as the actual lending fee,
loan utilization, short interest, and the short fee risk measure proposed by Engelberg et al. (2015)
3
are included in the regressions. Of all of the variables we include in the regression models, the
option-implied lending fee is the most significant predictor of stock returns.
But thanks @ZGWZ for your formula, for me it works fine as a proxy, i checked a lot of stocks that have a higher ratio and nearly all of them had also expensive options, which according to the paper i posted is also a good predictor of forward lending fees, which is probably even more important than current lending fees. Sadly a lot of the returns in my short system came from these stocks.
So option implied borrowing costs are important, but current borrowing costs are not.
And the effects of them are mostly reduced by the lending costs as you said
Yes, it would be nice to have option data.