Like many of his designer models that did not perform well before, Andreas has removed this AI model from P123 DM Library (I just check - selection bias?).
Regards
James
Like many of his designer models that did not perform well before, Andreas has removed this AI model from P123 DM Library (I just check - selection bias?).
Regards
James
I believe there were other reasons for the removalâthe model's performance was actually close to the benchmark, so not necessarily poor.
That said, I do agree that removing Designer Models after launch shouldn't be allowed, as it can introduce selection bias. It would also encourage creators to be more thoughtful before publishing.
If you'd like to support this suggestion, feel free to vote here: Designer Models enhancements
I agree. Itâs inherently difficult to draw strong conclusionsâpositive or negativeâabout any single modelâs performance.
One major issue is the multiple comparisons problem, especially when models are evaluated in the context of many others. If weâre selecting the best-performing models (or even just a few top ones), we run into the issue addressed by the Bonferroni correction: the more comparisons we make, the higher the likelihood that any apparent outperformance is due to chance.
That said, I have a practical suggestion related to thisânot just a theoretical point:
It would be helpful if P123 simply displayed the number of models each designer has submitted, along with the total number of designer models submitted overall. That basic context alone would allow users to make more informed assessments about whether a modelâs performance is likely meaningfulâor just the result of random variation.
Importantly, this wouldnât require revealing how the other models performed. To apply a Bonferroni correction, you only need the number of comparisonsânot the individual results.
Even if P123 isnât ready to publish complete data on all designer models, sharing just these submission counts would still represent a big step forward in transparency and statistical integrity.
It is pretty straight forward to see if a particular designer model performance is positive or negative.
There is a sort function and if the designer model underperforms the 3M,1Y,2Y benchmark by a signficant margin, it probably means performance is not great.
Here is a screenshot below.
Regards
James
I
James,
I thought I was actually agreeing with your broader (pervious) point which is that you cannot just look at whether the returns are positive or negative for a small amount of data, but ratherâas you said--survivorship bias (and other considerations) come into play
Maybe I missed the point of one of your posts?