FINRA publishes (disseminates) short interest data for free.
However, it is published about a week after it is submitted to them by the exchanges. Source. Presumably, Marco knows this, and is therefore negotiating with the exchanges to provide up to date short interest data. Thanks.
FactSet is semi PIT. It is better than most data providers but not as good as P123’s current provider (who wants such a large price increase that it is not possible to continue using their PIT data). Some have said the difference between the semi PIT and the complete PIT is minimal.
These differences only affect backtesting of ranking systems and simulations. I don’t expect any significant difference in how live systems behave.
MY PLAN
As soon as possible in 2020 (when we have access to both the current PIT data and the new FactSet data), I plan to do backtests comparing the results for both sets of data.
Hopefully the results will be nearly identical (as some have suggested). Then I can use the new data from FacSet to backtest new ideas in a leisurely manner.
If the results are significantly different, then I plan to do as much backtesting as possible in the first 6 months of 2020 while we have access to the current PIT data.
Even in the worse case (significant differences), I’m not worried because I think I can easily do all the testing I need in the first 6 months of 2020. This confidence is based on the results of a recent review of my current live models. These models have ranking systems that were developed in 2013 and revised in 2015. In my fall 2019 review, I found there was almost nothing that could profitably be added to them. Sure I tried out several new ideas (new to me vs 2013 and 2015) but I ended up changing very little, very very little. So in the worse case, I’ll devote more time than I’d planned in first half of 2020 to testing the new ideas. I’ve already tested the most promising so it’s not a long list. And then I’ll stop testing and just use what I’ve got. I realize this approach is easier for long time P123 subscribers than new comers, but it is my reality.
Thanks for posting the link to Factset’s efforts to create a true PIT database for Estimate data.
One important take away for me is that Factset’s new PIT database for estimates will go back a full 10 years (Dec 2009 on). With that we can run our backtests for the most recent 10 years using the their full PIT data and re-run the test using their regular (not fully PIT) estimates database. How close the results of the 2009-present tests match would give us a good idea of how much (or little) trust we could put in using Factset’s not fully PIT estimate data for 2000-2009 tests.
Estimates data has to be PIT. Otherwise one could never track revisions. Fundamental data is PIT-ish, as Brian rightly concluded. We will be putting systems in place to adjust for this as best we can.
Looking forward to trying the new data in existing models.
Someone mentioned the possibility of daily rebalancing. Actually, I would be interested in monthly rebalancing for some models. Currently I use the first Monday of the month as a substitute for monthly (but also check every Monday for significant changes in hold rationale).
In addition, can we get more information about how p123 is going to map existing sector/subsector/industry/subindustry classifications to FactSet and how this might be customized by subscribers?
Being forced to abandon GICS, it sounds like this is one area there could be a lot of dislocation.
Indeed. RBICS is quite different, and I will be providing a lot of documentation for you. A lot of companies are going to shift industries and sectors, and the groupings are going to take a little getting used to. But we’re going to make sure you’re well prepared for the shift. Like GICS, FactSet’s RBICS is a four-level classification system (actually it’s a six-level system but we’re ignoring the fifth and sixth levels). While RBICS does not use the terms sector, subsector, industry, and subindustry in the same way as GICS, we’ll be continuing to use those terms. Some of the sectors (energy, utilities, health care) are very similar to GICS, some are quite different. RBICS has almost three times the number of subindustries as GICS does. We’ll be mapping almost every GICS sector, subsector, industry, and subindustry to its closest RBICS equivalent. But this won’t be a 1-to-1 mapping, and there will be a little bit of minor overlapping in certain places. And there are a few GICS industries and subindustries that simply can’t be mapped. If you’re comparing how a stock’s value or performance compares to other companies in the industry, the companies that constitute that industry will usually be quite different once the switch is made. Users will have a chance to compare FactSet and Compustat data soon, but before that I will be introducing RBICS to the community with plenty of documentation.
Thank you very much, Yuval, and look forward to seeing your work with this.
Are there any other areas you can think of where changes from S&P to Factset may be as dramatic as with the sector/subsector/industry/subindustry mapping?
Will there be a period of time in which both data providers are available, so that we have an opportunity to compare our models (backtests and current selections) and tweak as needed so as to minimize the potential impact on performance?
One area is that of short interest and institutional ownership data. We’re not exactly sure what the changes will be yet. We’ll keep you informed.
The other is how FactSet deals with the period between the press release and the SEC filing, which is quite different from what Compustat does. Again, we’ll keep you informed.