From Stanford to Miami: Teaching Finance with Portfolio123 — My CAPM Test Study

Thanks.

Do you still use elliptical trimming (or Cook’s Distance) at times? Seems like there might be a case for considering it here as well — given the high leverage of the outliers in this data.

Even if someone wanted to keep those outliers (for whatever reason), the leverage they exert can give a misleading impression about the slope for almost all the other data points. These extreme values on the x-axis lead to a slope that looks quite different from what you’d get if they weren’t so far out (still extreme).

Kind of a random effect on the slope depending on where the are on the x-axis. Hmmmm…random? Quite the assumption

Maybe not so random? Probably not in fact. One might also consider whether the data is non-linear or heteroskedastic. Both would undermine the assumptions behind standard linear regression and could explain why the slope appears so sensitive to extreme values along the x-axis.

I think it’s fine to remove (trim), or Winsorize points — but it’s worth pausing to consider why those points are there in the first place:

  • Is it noise trading? If so, that might offer insight into behavioral finance.
  • Is it non-linearity that could be exploited with a different model — especially if other quants are overlooking it?
  • Is it heteroskedasticity, in which case an inverse-variance regression might be the right tool? That’s a standard approach for handling this issue and would likely smooth out the regression line without discarding the data.

I suspect that an inverse-variance regression would make the relationship look much more linear — and might render the rest of this discussion moot.

It might be worth considering before throwing those data points out

Lots to think about, and I don’t think there’s an easy answer. In an academic context, it seems reasonable to expect that researchers explicitly state which data points they remove — and why.

For our models, it would be ideal if a more advanced model could have trained on Robinhood — told me to buy it — and done so without distorting the slope for all the other data points.

I don’t claim that would be easy to achieve in practice, but I’d probably start with something relatively simple — like an inverse-variance weighted regression.

Addendum: Because you’re using beta — which reflects variance — on the x-axis, your plot is a strong indication that heteroskedasticity is likely present in the data.

And considering that CAPM is designed to account for all aspects of risk — including the increased variability of returns (i.e., heteroskedasticity) — your plot may actually serve as strong confirmation of the CAPM when this Heteroskedasticity is corrected for.