Very nice! A couple of observations:
The outliers have a HUGE amount of leverage in this linear regression. To the point that — if you really wanted to — you could probably remove the 3 outliers on the right, and confirm the CAPM model!
Not sure you’d want to do that. But if you do keep them, then clearly there’s a real point here about the effect of extreme outliers on predictions — especially in our AI models.
Theoretically, what basis would you have for removing them in a study? I could imagine some academics doing so by labeling them as the result of noise traders — attributing the deviation to behavioral effects, retail investor overreaction, or short-term mispricing. And they might even be right about that… perhaps?
But I might be inclined to keep the outliers — to show that markets aren’t efficient because of noise traders (and their effect on stocks like Robinhood).
Some clarification of leverage:
High-leverage points sit far out on the x-axis (e.g., very high or low beta) and can strongly pull the regression line toward themselves, even if their actual y-value (e.g., return) doesn’t follow the trend.
They have the potential to distort the slope of the line — not because they’re wrong, but because they are rare and sit in a position to influence the fit disproportionately*.*