Good idea
We've been discussing this. There really is no use for a backtest that uses any data used in the training of the predictor (debugging maybe?) We're leaning towards giving an error if a predictor is used with data used for training.
Would be interesting to create a predictor that lets you recreate the validation folds. In other words a special predictor that automatically switches predictor instance based on the as-of date of the backtest. So for ex. a 4 fold validation will have 4 different predictors instances.
Re. Designer Models... We no longer calculate any stats based on backtest data, only out of sample. This has been in place for several years and I think curve-fitting is no longer an issue. Curve fitted models fail out of sample and it's pretty obvious. You can still see the backtest data if you want, but it comes with disclaimers. Furthermore we have a 3 mo. incubation before a model can be opened.
Also please note the the "Classic P123" models presently suffer from using training data. Ranking systems are being optimized using the same data that is then used for a backtest. However it's not as obvious as with ML since ML is much better at curve-fitting.