This is not a new problem and not a problem just for stock-pickers. Personally, I get a better perspective looking at baseball. Baseball scouts have this same problem. Here too, it can be a million-dollar question. Baseball has ALWAYS had statistics and people making decisions based on those statistics.
Which of these players is the better player? The player who got on base (from hits) 4 times out of 10 at-bats or the player who got on base 300 times out of 1000 at-bats? One player has a 400 batting average (4/10) and the other has a 300 batting average (300/1000). There is more data on the player batting 300–making this basically the same type of question. But details matter as far as how much data Whycliffes actually has (and how cherry-picked it is).
Me, with the baseball example? I will take the player batting 300 (300/1000) for his hitting skills. Maybe the other player can pitch. I don’t think there is any other rational answer.
John Paciorek has a perfect batting average of 1000 (3 for 3). Is he the best player ever? I mean OMG! Who even cares if he owns a glove if he can bat 1000!!! He is in the Major League Baseball Hall of Fame right?
Here is a book that will tell you who the best-hitter-ever is using Empirical Bayesian Statistics (and how to handle this general problem in the process). But spoiler alert: it is probably not John Paciorek—although you can never be 100% sure. Introduction to Empirical Bayes: Examples from Baseball Statistics
The key to this is using “shrinkage.” Test user uses a different type of shrinkage by using MOD() It is different than Empirical Bayes and represents a type of “regularization,” I believe. This is a serious problem and people are always finding solutions to it.
Whycliffes, you asked a similar question not too long ago: Previous Post
Yuval. Thank you for your excellent answer to Whycliffes’ question about why real-world results tend to “shrink” out-of-sample. Here is the (re)link to the article from your post. The paper uses Bayesian statistics as you know: Yuval’s Link
Yuval got it right here! There are other ways to use and/or predict shrinkage although Bayesian Statistics is probably always the best.
Sometimes I will get a confidence interval and use the lower bound as my estimation of real-world results going forward. That is probably what I would do here if I did not actually uses a Bayesian solution. Using confidence intervals works because more data—like being up-at-bat more times–will narrow the confidence interval. Using confidence intervals is also suggested in the paper Yuval linked to. I have used that sometimes. MOD() can work too. It is not a new problem and people are always coming up with good ideas on how to solve it.
The mathematical proof that using the lower-confidence-interval is probably better than the upper-confidence-interval (or even the mean) uses Murphy’s Law which states that you can always expect the worst. I find it works in practice. More formally, the multiple comparison problem (lots of trials without that much data) can be used in a rigorous proof.
Thank you Yuval for your answer giving what is probably the best mathematical solutions (Bayesian statistics or possibly confidence intervals as a “back-of-the-envelope” approximation) to this problem with your previous link. And again, regularization in all of its forms is always good and works very well in practice.
Jim