Compustat vs Factset Factor Values Don't Match

sthorson · June 6, 2020, 9:01pm

My primary ranking system returns very different rank values between Compustat and Factset.

Finally had some time to dig into why. I ran a screen using Current>Use Prelim for both providers, pulling data for a number of factors on the S&P 500 universe.

The results are very concerning. The percentage of factor values that matched, or even close, is very low. This makes me very concerned about trusting one data provider vs the other.

Attached is a spreadsheet comparing the values and percent difference in values. Its unnerving to say the least.

Across 25 factors the average percent that match values is only 43%. Most anything having to do with earnings isn’t even close.

Any thoughts would be appreciated.

Compustat_vs_Factset.xlsx (578 KB)

yuvaltaylor · June 6, 2020, 9:57pm

I think it’s worthwhile looking at these factors in some detail, because there are different reasons for the differences and similarities.

First off, some of the “FactSet” numbers are coming from Compustat (because we haven’t yet made the switch for these factors) or from ICE (technical factors like price and volume), so they’re going to match more or less precisely. This is the case with Inst%Own, Vol10DayAvg, Float, and $AvgDollarVolume. Inst%Own will not match so well in the near future.

Now let’s look at one item in particular, NetIncBXorQ. In the S&P 500, 129 out of the 500 companies have slight differences in NetIncBXorQ. Almost all of these differences are tiny: 72 of those are less than 1%. Now what happens when you look at NetIncBXorTTM? Well, if 25% of companies are different in one quarter, when you add up four quarters, you’ll get 68%. To get to EPS, you need to divide NetIncBXorTTM by SharesQ, and there are going to be a few differences there too. When you calculate CashFl, you need to deduct from NetIncBXor preferred dividends (which shouldn’t be different) and depreciation and amortization (which are very different between Compustat and FactSet.) Now let’s compound those differences even further by introducing EPS%Chg. You’re now looking at numbers arising from eight different quarters, so you should see about 90% of companies be different. This alone explains almost the entirety of your spreadsheet. Remember that ROE%, ROA%, and NPMgn% are all derived from NetIncBXor.

Then we get to the estimates. Most of the S&P 500 companies have well over a dozen analysts. The number of analyst that FactSet relies on is a little higher than the number that Compustat relies on, and they’re often quite different analysts. So you’re going to get very little agreement there.

There’s a fluke in the spreadsheet: ProjPENextFY is using Compustat numbers rather than FactSet’s. That needs to be fixed.

As for Sales, that seems at first a lot more puzzling. SalesQ should match, more or less. But if you look at the companies that are different, they’re almost all banks, insurance companies, and biotechs. Banks and insurance companies report sales quite differently from other companies (and I honestly don’t know why biotechs should have different numbers between FactSet and Compustat). It’s worthwhile digging into a few companies’ financial statements to see where these numbers are coming from (Jerome has done so already in another thread).

I think that covers most of the discrepancies you found in the spreadsheet, and I hope this explanation helps.

sthorson · June 6, 2020, 10:20pm

Thank you Yuval, your response is appreciated.

Which begs the question which provider has more trusted data?

My ranking system is thrown way off by Factset data, which I believe should not be the case as ranks are relative. So it led me to the experiment.

Many of the differences in factor values are decidedly different by comparison. If the values come from SEC filings, should they not be the same?

I’ll run the experiment without using prelim. My guess is the values should be closer to matching. We’ll see.

Thanks again,
Steve

yuvaltaylor · June 7, 2020, 4:05pm

Compustat and FactSet’s data are both trusted worldwide. If not, they wouldn’t be so successful. They have different strengths and different approaches to data. FactSet covers more stocks and aims to get the best present-day data; Compustat places more emphasis on point-in-time and standardizes data for banks and insurance companies so they look like other companies (while FactSet thinks those companies should be very different).

Before you compare Compustat to FactSet, try running your Compustat ranking system on subsets of your Compustat universe. Create five different universes using the commands mod(stockid,5) = 1, mod(stockid,5) = 2, mod(stockid,5) = 3, mod(stockid,5) = 4, and mod(stockid,5) = 0. Does the ranking system work equally well on all five? If so, it’s solid and should work just as well on the FactSet universe. If not, it may be over-optimized.

SEC filings have to be standardized by the data provider because companies use a lot of discretion. All data providers standardize their data to some degree, and FactSet and Compustat take different approaches to different items.

sthorson · June 7, 2020, 4:54pm

The ranking system simply takes the 3 month percent change in a factor (node) value, equally weighted. I use 35 different factors.

The problem I’m seeing is that the factor values are very different between Compustat and Factset. As I stated above, on average only about 43% of the values match up.

As a result the stocks rank differently with Factset than with Compustat. This throws the rank > 60 buy rule into chaos. My strategies have done well with Compustat, but I’m getting a largely different set of stocks passing this buy rule with Factset.

Logic tells me there will probably be some difference in how stocks rank, but not as much as I’m experiencing.

Thanks again Yuval. I guess in this case, it is what it is.

-Steve

EDIT:

To better illustrate what I am experiencing I ran the ranking system on Compustat/prelim and Factset/prelim, same date:

The average difference in rank position in Factset vs Compustat for the top 100 is -305
75% Factset matched Compustat rank > 60
63% Factset rank positions are found in Compustat top 300
46% Factset rank positions are found in Compustat top 50
** Only 2 Factset rank positions are found in the Compustat top 10 - this one is alarming as it will directly impact rebalance selections.

So you can see, with Factset I am working with a widely different dbase of final picks for the portfolio buy and sell rules.

I don’t think it should be this different. All I’m doing is ranking on the 3 month % change of factor value(s).

dnevin123 · June 7, 2020, 6:40pm

sthorson,

Out of curiosity, what happens when you remove financials from the mix and compare the providers?

-Daniel

sthorson · June 7, 2020, 7:20pm

Daniel, just did a run. For 25 factors:

w/ Financials Factset matches Compustat on average 41% (500 stocks)
w/o Financials the match stays close at 43% (434 stocks)

sthorson · June 7, 2020, 8:59pm

After further research I have determined if I allow a +/- 2.5% difference in factor values between Compustat and Factset it raises the match percentage significantly. It’s a 71% match rate, good enough for me.

Also, I ran a sim on both using the mentioned ranking system and buy rank > 90, sell rank <90. For the preceding two years the results were off by less than 2 percentage points between the two data engines.

Plus, I ran rank performance on both and the results were also similar.

However, as mentioned before, the pool of higher ranking stocks from each engine are decidedly different.

I’ll see how it looks when I run my full research process at the end of June.