Regarding look-ahead bias, do we throw the baby out with the bathwater? As I argued earlier, some look-ahead bias is absolutely inevitable simply because FactSet didn’t have the data it now has and Portfolio123 wasn’t always in existence. But we have gone to tremendous lengths to eliminate it as much as possible. Show me another backtesting engine that has less look-ahead bias than we have.
Information about the FactSet estimate data that we’re using is here: https://www.factset.com/marketplace/catalog/product/factset-estimates-consensus
All estimate and fundamental data is updated daily except for EPS revisions, which are updated weekly. By EPS revisions I mean the CurQUpRev, CurQDnRev, CurFYUpRev, CurFYDnRev, NextQUpRev, NextQDnRev, NextFYUpRev, NextFYDnRev, and TotRevisions factors.
Philip writes, “the fact that the sims always outperform the ports OOS suggests there is look-ahead bias and there is something quite problematic.” This is absolutely not the case. I took 20 live portfolios, none of which were updated in the last two years, and ran them as sims over the last two years, and compared the 2021 performance of each live portfolio to the corresponding sim. In twelve cases, the simulation did better, and in eight cases the portfolio did better. Statistically, this is no better than random.
Philip also wrote, “I just want to confirm that the fundamentals / pricing data is perfect PIT? Ie. sim vs. port would be the same if I only use fundamental / pricing data.” Certainly not! This was never the case with Compustat, and is certainly not the case with FactSet. Perfect PIT exists only in your imagination. Live strategies have never matched their corresponding simulations perfectly, whether relying on estimates or not. There are always going to be differences in stocks chosen simply because the mathematics of ranking systems are extremely sensitive to very tiny changes.
I’d like to sum up this issue as follows. Financial data is extremely imprecise. What do we measure when we use financial data? We measure what companies decide their data is. We measure subjective things. As I put it in my webinar, “In most sciences—and even a lot of sports—you can work with completely objective statistics such as height, weight, and distance—or field goal percentage, rebound percentage, and number of points scored per game. But not in finance. Almost every single number on a financial statement is an interpretation by the company’s accountants and officers.”
These subjective things are then put into ranking systems, which depend on extremely minute differences to calculate ranks. Many nodes, for instance, are dependent on industry composition; if one stock in the industry disappears from or gets added to that industry, the node rank will change; even more frequent would be a value in one stock changing from an N/A to an actual number, which will affect all other stocks in the industry. I’m sure you can come up with dozens of other examples where even the flap of a butterfly wing will change a rank.
I think it’s a huge mistake to expect precision in anything to do with finance. “It is better to be vaguely right than exactly wrong,” said Carveth Read. That is a fundamental rule of logic, but it applies especially to finance, which is as inexact as any system of data can possibly be. (If there is a more inexact system of data, I’d like to hear about it.)
I am a firm believer in backtesting. But I know that my backtested results are mutable, approximate, and chimerical, and that the correlation between in-sample and out-of-sample performance is always going to be very low (Best-case scenario? About 0.1 IMHO). Why? Not because of any flaws in the backtesting system. Backtesting using Portfolio123’s simulation and ranking tools is by far the best available. Simply because of the nature of markets and the nature of financial data.
Sorry to spout on at length, but this discussion seems to have gotten away from the fundamental facts about the way Portfolio123 works. I have never once seen completely exact replication of results, and don’t expect to. I have seen good approximations of results, and I think users should be happy with that.