On another thread someone raised the issue of the Point in Time data. I’m curious, just how accurately does the data here reflect what was available at the time it was initially published? In other words, how accurate are our backtests? Also, is S&P data more historically accurate than Compustat, meaning backtests for S&P 500 businesses are more reliable? Welcome anything that sheds light on this, thanks.
The Compustat database is widely considered to be the best in the industry. They aim to make sure that the historical data accurately reflects publicly available information and they do an incredibly good job doing that.
That said, if there is an error, they will retroactively fix their own database errors. This may be a feature or a bug depending on your needs.
If you rebalance on a Monday and an analyst has recently increased his/her earnings estimate on stock ABC will it be in the data base when you buy your stocks Monday?
Well maybe not as CompuStat has some delay in its database. When there is a delay in the database you buy XYZ instead. By next Monday the revision is in the live database for rebalancing and you may buy it then (one week after the revision). But some investors have that information a week before you do. At least in theory it is possible that the stock price has jumped on the new revision data and you are a week late—compared to some other investors. Well, we wish we could subscribe to one serve and have instant, perfect information but that is not real life, right?
The problem comes when CompuStat later puts the first Monday as the date the revision was available. In this case the sim does not reflect what you would have been able to do if you were running a live port. This could be misleading should we decide to make that sim into a live port with real money. But, to be clear, this would have no effect on ports we are already running that have been shown to work out-of sample: so this has no immediate effect on some of us.
Marco can correct me if I am wrong on this and perhaps update us if this is no longer the case with CompuStat. i would not be surprised if CompuStat now “timestamps” it and it is completely PIT (perhaps with the exception of a few true error corrections). If so, I would like to know so that I can delete this post: It would love to be corrected with new (to me) information on this.
mm123,
So P123 is near perfect now, I think. SnapShots should guarantee this. And a look at a sim of one of my auto ports (over the same periods) that I did this weekend confirms near perfection: In my small sample.
I like perfection. But I am not claiming that moving to the new way (without SnapShots) is going to be a problem. I made copies of my live ports and put them on auto today. I will run them as sims in a few months and see if the ports and the sims differ significantly. My guess is there will be some difference in the stock picks but that the return might not differ by a significant amount. But I do not know what CompuStat is doing now and there may not even be a difference in the stock picks.
Not only do I have an open mind on this but we have trusted Marco to make the best use of the resources he has available (its call Capitalism). This has been working for me: Marco has managed those scarce resources well. And if there is a problem–that we are all unaware of now-- then Marco will probably address it in a rational way.
[quote]
I made copies of my live ports and put them on auto today. I will run them as sims in a few months and see if the ports and the sims differ significantly. My guess is there will be some difference in the stock picks but that the return might not differ by a significant amount.
[/quote]Please keep us updated. Just don’t make the same mistake as me; make sure to label the ranking system, the universe, the custom formulas, the lists and all that stuff clearly so that you don’t inadvertently make modifications which would invalidate the entire experiment.
I have been doing this for a few years and for the most part the results are pretty similar. However, I have one family of portfolios which always seem to show 10% or so higher returns when redoing them as simulations. That 10% is often the difference between 25% returns (which are excellent) and 15% returns (which are nothing to write home about).
(This portfolio is not based on earnings estimates at all but uses a bunch of financial statement line items.)
It’s frustrating because after years of running this portfolio live, I still don’t know if it’s any good.
[quote]
What happens when a company’s earnings are restated?
When the update is made in P123… would it change all values historically or only after PIT?
[/quote]It’s PIT.
[quote]
GE’s restated earnings for 2016 and 2017 (April 13) are not in P123 yet - not sure how the process works with Compustat
[/quote]GE changed just the accounting. Since Compustat standardizes the accounting I wonder if they plan on restating it at all.
Are you asking if data for companies in the S&P 500 is more accurate within Compustat than for other companies? If so, I highly suspect the answer is yes. While I am sure the data acquisitions team makes every effort to be timely and accurate, it prioritizes its finite resources on the largest and most generally relevant companies for U.S. investors. Fundamentals coverage is generally 100% for S&P 500-1500 companies; and usually very accurate. It’s only when you get into the small-mid boundary when data accuracy , consistency, and frequency becomes a larger issue, IMO.
If you are asking whether Compustat or S&P data is more accurate, then the answer is neither. Compustat is S&P… S&P Capital IQ bought out Compustat several years ago and now operates two separate data acquisitions teams. Compustat does most company fundamentals and S&P does the rest. Though, to the end user, the process is translucent.
Primus, my question was not limited to accuracy per se, but accuracy at the time of publication. We know that Compustat has quality data, far superior to others such as Thomson Reuters. I think the original question is pretty clear - is the data accurate at the Point In Time it’s published? This is hugely important if we are to rely on the predictive power of our backtests. On some other recent threads there was a suggestion that non-S&P 500 businesses were slightly more prone to subsequent data revisions, which raised some concern on my end. As I read the answers here I’m re-assured, it seems that suggestion about data revisions may have been mis-stated over overblown.
Related: One major source of revisions is probably companies’ restatements of prior operating data. Compustat has dual data models for original and re-stated data. While I would want to double check with P123 staff, I believe P123 PIT relies on as-stated data until revised data would have become available. For example, let’s say company Z revised it’s quarterly revenues two weeks ago to $90 after reporting $100 four weeks ago. Assume the database is up to date and accurate. If I were run a simulation going back one month in time, the investment criteria would’ve been based on the original as-stated quarterly revenues of $100 for the first two weeks, and then reverted to $90 thereafter. So, as we have shown, a single line item can have multiple data points in a single reporting period. This does not, however, mean that Compustat reported the data inaccurately.
This might explain why S&P 500 companies results are re-stated more often since the companies: a) are more visible; b) have more thorough and timely Compustat coverage; c) have more and larger operating departments subject to accounting revisions; and d) are subject to greater regulatory scrutiny and audit risk hence requiring more frequent restatements.
That question isn’t clear to me since I don’t know what ‘accurate’ means in this context.
If CapitalIQ sticks to the process described in the white-paper listed below, data from several sources is captured before the data field is considered finalized. Is a press release data accurate? Or do you need the data to be from the 10Q filing? And that process also adds latency. Is late data still considered accurate?
When it comes to data revisions, CapitalIQ seems to accept and handle revised reported data in a manner that prevents look-ahead bias (i.e. via a point-in-time database). It’s still unclear to me what happens when CapitalIQ needs to revise their own transcription errors.
“It should be noted that in the Capital IQ database, the filing date, or data availability date, is the date the document from which the data was collected was filed. This is in contrast to the Compustat PIT data, where the data availability date corresponds to the date it was published on the CD-ROM.”
WOW! Important to know at least. And SnapShots has the potential to make the CapitalIQ data PIT in the same sense that the Compustat data is PIT.
There you have it. CAPITALIQ DATA IS PIT FOR WHEN THE DATA WAS PUBLISHED. NOT FOR WHEN IT WAS MADE AVAILABLE FOR YOU TO USE FOR YOUR REBALANCE!!! Very simply put directly from CapitalIQ.
NOTE: The following post is not accurate. Thanks to Walter for pointing out my error. Fixing the error reveals that WeeksIntoQ is indeed the number of weeks since the most recent quarter as per Capital IQ. It is not the date at which the Capital IQ last updated the data. As far as I know, there is no way to recover the historical lag times between when a company reported and when Capital IQ updated the data.
This question has been raised on these forums before. I think the closest approximation of lag time from when the most recent quarterly report was released to when Compustat was updated with this information goes something like:
According to current data, this approximation of lag indicates that companies in the S&P 500 experience a typical lag time of 3-4 weeks (about the same for the average and median) and maximum lag time of 8-9 weeks. If this approximation is even approximately correct, lag times can indeed be quite significant.
Paul’s research on the 2-4 week update schedule is not inconsistent with my findings. And while the missing clarity about the meaning of WeeksIntoQ is somewhat dissatisfying, Yuval amplifies that “WeekIntoQ currently measures the weeks since Compustat updated the data, not since the earnings announcement.” If, indeed, Yuval is incorrect, then the above approximation of quarterly lag should be pretty spot on. The fact that @ReportingLag_Q is strictly positive for all stocks in the “All Fundamentals” universe lends strong credence for Yuval’s definition.
Thanks! The CapitalIQ data is of interest too. If possible, maybe of more interest as it has a different way of handling the “PIT” or not PIT or sort of PIT data but different from Compustat. What ever one calls it a significant discrepancy between the data available for the sims and ports would put any port using this data into question.
Indeed, I have wondered if this was not the cause of the decline in out-of-sample performance in the Designer Ports. And I still have this question: Is it?
That having been said, I tried to look at other sources of Earnings data previously. Clearly Capital IQ is better than Zacks (live or with the backtests). Better than Thompson/Reuters? I am not sure but I could not find convincing proof either way on this: keep in mind my access to Thompson/Reuters data was limited to what I could get on Yahoo (current data). But if anything, anecdotally, the stocks that had new upward revision in earnings estimates with CapitalIQ (but not) Thompson/Reuter’s seemed to have later upward revisions to the Thompson/Reuter’s estimates rather than the opposite (it was less common to see the CapitalIQ estimate revised downward to match the Thompson/Reuter’s Estimate). Again, too anecdotal for any conclusions. Furthermore, it is not always possible to tell if any discrepancy was a delay in the service, or due to the services following different analysts. In short, my analysis was a mess—which may be why I have no conclusions.
That’s good info on Zack’s and TR. Pedigree matters.
I will reread the PIT document, but I believe it is out dated. From what I understand, there have been a few changes since 2010 that have brought Compustat and Capital I&Q closer together.
Notably, I don’t think there’s a clear distinction between Compustat and Capital IQ anymore. At least, I do not make such a distinction. From what I understood of my conversations with Capital IQ, S&P kept the Compustat data acquisitions team around for pedigree reasons while migrating it’s internal acquisitions team to Compustat’s data model. The data is then merged in a process which is translucent to the end user. When you access S&P Capital IQ via the Research Insight and/or XpressFeed data feed, I don’t believe that there is any way to distinguish how the data was collected.
You do not think the below quote from CapitalIQ makes them different? It could, in theory, make a huge difference in how well the sim predicts how a port will do using the CapitalIQ data, I think. Depending on how different the data is for the sim and the port. For Compustat its is the same data for the sim and the port.