robust linear regression

Check out the diagram below, Jim. For the blue diamonds, the blue line is OLS regression, the red line is LAD regression. The x-intercept for OLS regression is 8; the x-intercept for LAD regression is 0; the mean excess return is 0; and the median excess return is 4.

Now do you see why I like using the median excess return?


OLS vs LAD final.png

Right.

I am not sure what you conclude from that (there are a lot of good points there).

Do you want 0 for your alpha calculations?

I will have to think about it and/or look it up but I think it is possible you will be getting 0 a lot with your LAD. I could be wrong on that.

-Jim

deleted - see below.

Cool.

Do you like that better? I have no basis for judging this. (intended for post below)

-Jim

I made a mistake with the LAD line. Here’s the correct version. LAD intercept is 5.97 (according to computer calculations using successive estimations) but should actually be 6.0.


OLS vs LAD final 2.png

Cool.

Do you like that better? I have no basis for judging this. If you do you should keep doing it that way.

At this point I can just support the point you made in your post: you can get an answer with less data points.

-Jim

Just for kicks I added the Theil Sen regression line too, in green. The intercept is 5.5.

For what it’s worth, it seems to me, intuitively, that the correct intercept should be 4. That’s what I would predict given this data set. And that’s what I get using the median excess return rather than any of the linear regression options.

Now, obviously, linear regression shouldn’t be applied to data that looks like this in the first place. But I just wanted to make a point about using medians.


OLS vs LAD final 3.png

I see what you are saying. I would just have said—to myself—I should not be doing an OLS on this. And actually, a mean or median of 5.5 looks pretty good.

I defer to you on why you did a linear regression to get this (in this hypothetical example).

But it seems clear why you should not be doing an OLS. Is it linear? Is it from a normally distributed population (the data in this sample does not look normally distributed). If not, are there enough data points to satisfy the central limit theorem? Constant variance? And edited for David’s comment below: Stationary?

I am not sure about some of these assumptions on these hypothetical points. But I can see why you might need to use something else. I see your point and agree with it.

I would only add that with a large amount of data that takes advantage of the central limit theorem, I try to make the other assumptions true: when I can.

-Jim

I wanted to play with Theil-Sen IOT implement a more robust regression in some DCF models I use in P123. For example, the gross margins in some industries can be incredibly noisy because impairments are highest when revenues are lowest. Therefore, most regression techniques are biased to the upside, with some even going asymptotic. I was hoping to get my head around some better, more robust estimators than can be implemented using P123 syntax.

Based on your discussion, I anecdotally think that Theil-Sen might not be well suited for time-series and/or non-stationary process. However, if you would compare it to something in which the “x” and “y” are concurrent and/or stationary, then the pairwise sampling should increase the statistical significance by increasing the sample size by n*(n-1)/2. This is in effect re-sampling, or in another’s parlance “bootstrapping”

Am I interpreting this right?

David,
What is, other the “Block” Bootstrapping?

And isn’t there still an assumption that the distribution is symmetrical which seems to be an unfortunate assumption of most nonparametric tests?

But I am seeing the advantages of this test and am learning.

I’m not trying to be negative. I have used some nonparametric tests on things that are not symmetrical: any criticism would have to be directed towards myself first. And almost certainly better than using a parametric test on something that is neither normally distributed nor symmetrical.

-Jim

I would say so. But it depends on what you’re looking for. If you’re looking for the slope that will best predict future data, then I think Theil Sen is far more robust than OLS, and that it does indeed increase the sample size. If, however, you’re looking for the intercept, which is what I’m after, Theil Sen’s only advantage over OLS is its relative insensitivity to outliers. LAD may be better. The trouble with Theil Sen estimation for intercepts is as follows. Theil Sen approximates the line closest to the points in a diagonal direction (i.e. the distance perpendicular to the line). Intercept is a purely vertical measure. So you want the line that’s closest to the points in a vertical direction.

Jim,

I just meant to differentiate statistical bootstrapping from its other meanings in finance (e.g., such as starting a business by one’s “boot straps”; interpolating points on a curve; etc).

Perhaps… but in order to assume non-symmetry, one must incorporate additional parameters. More parameters leads to calibration. Calibration leads to increasing chances of over-fitting. Over-fitting leads to the dark side.

In my opinion: A thing which is descriptive and which is also not highly calibrated has better likelihoods of being both predictive and prescriptive than a thing which is more highly descriptive but also more highly calibrated.

Leading to a good next question, I think: has anyone had success in using non-parametric regressions? How would one implement such a regression in P123? As an aside, all of the P123 ranking systems are by default based on non-parametric ranks.

Yuval,

Thank for that. Makes sense. I am interested in learning if you ever do find a “good” Theil-Sen intercept.

//dpa

Thanks David and Yuval.

I really do not know much about this test and this is making me want to learn more!!!

Honestly, I did not even know it was a nonparametric test at the onset of this thread. So any comments I made specific to this test should be taken with a grain of salt (at best). This assumption of symmetry has frustrated me with other nonparametric tests in the past, however.

Most of the times that I want to use a nonparametric test it is because it is a normal–but skewed—distribution (e.g., outliers giving a large tail on one side). What have I gained in using a nonparametric test that has symmetry as an assumption? Maybe I have actually gained a lot. I just have to wonder until I get a clear answer on this.

Using non stationary samples has been a weakness of mine (and probably remains so). So my question of what is not affected by this assumption remains a question of mine: perhaps out of frustration.

I am all for any methods we can use to go beyond looking at the pretty graphs: which I must admit is a pretty good start.

I remain committed to finding the mean of my slippage with no concern, whatsoever, about the median (or the mode).

Again, thank you.

-Jim

With the intent of making no points this is great question. Really great question!

For a moment I thought I was doing better when I stopped making the assumption that the rank performance test was linear and also using a test for nominal (or ordinal) independent variables for the percentiles. Specifically the ANOVA test gets around these assumptions. But I had problems with the assumption of equal variances.

I do think that the central limit theorem does, often, take care of the normality assumption. Do you remember my post on bootstrapping the daily returns: Bootstrapping over 4,000 daily returns gave a histogram that looked pretty normal to me. The purpose was to just get a picture of what the central limit theorem can do with stock market return data. Of course, they do this in every statistics class with other types of data. Stock market return data does not seem special to me with regards to the central limit theorem—as long as it is stationary.

I would be interested in any thought on this. But for now I think it is possible to do some statistics that strictly fit all the necessary assumptions. I just use those when I can and often use statistics that do not fit those assumptions. Or I just look at the regularly spaced upwardly progressing stair-steps of the rank performance test and know it has to mean something–with no proof. As I said, I need look no further than what I have done to find bad assumptions and outright mistakes.

-Jim

I believe that mapping the monthly returns of a simulation against the monthly returns of a benchmark is non-parametric. The same can be said, as David pointed out, of ranking systems. And perhaps in both cases non-linear, non-parametric regression would be appropriate.

However, I’d rather keep it simple. My aim is high alpha (in a recessionary environment, high alpha coupled with low beta), and while playing with non-parametric regression might be fun (and appropriate to my data set), I don’t think I’m going to go there soon.

Below is the typical data I’m working with.


xxx

Yuval,

I’m not sure I have any suggestions on how to get a line to fit that data.

It looks like the skew, outliers and lack of linearity are driving much of your excess returns. I would not know how to remove that (or find a statistic that minimizes their impact) and have data that represents the same sim.

Probably not what you are looking for but I think a paired t-test would be totally appropriate: the central limit theorem is in full effect here with that many data points. Pairing the data will remove any concerns about your data being stationary (David’s point). This is effectively detrending. There is no assumption of linearity or of equal variances and I am sure you will find high statistical significance. And you just make 2 columns on the Excel spreadsheet, click Data Analysis and find “t-Test: Paired Two Sample for Means.” Googling our Youtube will fill in any details. There are better tests when they can be used. But I believe this one test that that satisfies all of the usual assumptions in financial statistics for your data. If anyone does criticized this tell them that you don’t want to see them use even a Sharpe Ratio as your test is similar but more likely to be stationary (which could even be tested).

Bootstrap it if you want. It is not necessary but it looks smart.

There are so many things I have tried to use a linear regressions on: like rank performance tests as you and David both mention. I keep trying. I think it just does not work for that. Which is actually David’s point isn’t it? P123 is not using any regressions on their rank performance tests (or any nonparametric statistics that I have seen). They are using the best descriptive statistics that they can: the buckets. They do not even call it a histogram (because it isn’t).

Hats off to Marc, Marco and the entire P123 staff for not taking others down the same path that I have tried to take on statistical tests for the rank performance. And for anyone thinking I am critical of any member, it will not be hard to find quotes of me suggesting that P123 should go down that path. There is a limit to how much I can fool myself on this. Bring on the quotes: it will be humorous (if not sad). And if you don’t want to limit the quotes to statistics, per se, then all that stuff about replacing ranks with z-scores and implementing multivariate linear regressions should be a real hoot!!! Even though it does kind of work. Sort of. If everything would just be linear like it should be………

But I do not think we have to abandon all of our statistics. And those who want to laugh at me should just keep laughing while I head to the bank today. I just hope I can keep you entertained! Yuval, you are ahead of me on extending the use of linear regressions. I will go back and see if I can use Theil Sen for some of my data. Linear regression is a great tool—in all of its forms—when it can be used.

I like what you are doing and I believe it will work for you even if you happen to find that you have to revise a few things along the way.

Best regards,

-Jim

[quote]
I did a correlation study to see which better correlates with OOS results, alpha calculated by OLS methods or calculated by Theil Sen estimation, and OLS won.
[/quote]What were the correlations?

I’ve now done even more correlation studies, and I’ve come to the conclusion that OLS alpha probably still provides the best correlation between an 8-year 100-stock simulation and the subsequent 3-year performance of a 20-stock portfolio, but only by a whisker. LAD alpha is good, CAGR is good, median and mean excess monthly returns are both good, Theil Sen estimation is good. While OLS alpha usually leads the pack in my correlation tests, the others are off by less then 0.03 (the difference between the correlation coefficients).

In my latest tests, I first tested 40 radically different strategies–different rankings, different universes, different holding times. The results (the average of plain correlation and rank correlation) were OLS alpha 0.781, CAGR 0.775, LAD alpha 0.767, mean excess returns 0.763, median excess returns 0.758, and alpha-sigma ratio (OLS alpha divided by s.d.) 0.725. I then tried testing 30 very similar strategies, the kind of strategies I use in my everyday trading, to see if one method was better at catching the little differences between them. Now the results were almost the reverse: median excess returns 0.580, LAD alpha 0.573, mean excess returns 0.570, CAGR 0.565, OLS alpha 0.557, and alpha to sigma 0.419. But with the exception of alpha-sigma, the correlations are all so close that I don’t think one can make a definitive judgment of which is the best. As for Theil-Sen estimation, it’s very close to median excess returns, and it’s extremely cumbersome, so I’m just guessing it’s in the same range. I’d rather not use it because it’s so difficult to manage.

This brings me to Jim’s suggestion of a paired t-test. I may be wrong, but from what I’ve read, the number that I’d be looking for with a paired t-test is Cohen’s d, which is basically the mean excess return divided by the standard deviation of the mean excess return, because that’s the measure of how powerful the effect is. Now whenever I divide anything by the standard deviation, I end up in trouble. My correlation coefficient drops by at least 0.05, and sometimes as much as 0.2. For a while I was really taken with the alpha-sigma ratio (partially because I invented it myself), but correlation just isn’t as high as plain old alpha. And you’ve probably already heard about my gripes with the Sharpe ratio.

Now if I’m wrong about the t-test, I’d be happy to give it another try.

I should add that one of the reasons my correlations are so close is that the numbers themselves are so close. In my 40-very-different-systems correlation study there was a 0.99 correlation between mean monthly returns and CAGR and between OLS alpha and CAGR, and a 0.98 correlation between OLS alpha and mean monthly returns; in my 30-similar-systems study, the same is true but there’s also a 0.98 correlation between median excess returns and LAD alpha. The least correlative of the measures are LAD alpha and mean excess returns, FWIW, with a correlation of 0.93.

I should also add that all the strategies I tested were primarily fundamentals-based, long only, with no hedging or market timing. In other words, they were all relatively high-beta strategies, all well correlated with the market. No doubt the results would be quite different for low or negative beta (long-short, hedged, and/or market-timing) strategies.