New data provider! Global stock data!

We should have the filings since we’re getting global data. Pretty sure that’s the reason we don’t have any data from S&P now. Would have to be converted to USD though as the only data appears to be in Chinese Yuan.

Has there been any updates on this process? Is it still on track for Q2 2020?

Q2 2020 is for the first phase : switching North America to Factset. We should be able to start Europe middle of Q2 then AsiaPac Q3. From what I’ve seen so far these are conservative estimates.

Thanks for update Marco. Will the FactSet historical data be point-in-time, i.e., no look-ahead bias?

PIT depends on the dataset. Dead companies are there. Estimates should be ok. Their taxonomy is ok too I think.

The financial statements is where we will have to make some assumptions. The main issues is they only have “two buckets” for statements. In one goes the as-reported in the other restated values (if any). This means – 1) preliminary reports are overwritten when final numbers come out 2) if company restates a period multiple times only the last one survives. There are also some lesser issues with the effective dates , but they key ones there: when a company first reported and the filing date (also easily attainable from SEC).

We’re watching the data live to learn more on how to make better assumptions. Factset officially recommends simply lagging the Q’s 30 days and 10-K 45 days. I believe there are much better assumptions, like based on the company itself. For example: companies are pretty consistent as far as which items they include in a preliminary report. We could basically create a stencil for each company to re-create a prelim report .

We’re also keeping track of changes that would be lost thereby creating our own PIT version starting now. Going forward the prelim data won’t be lost.

This is what we have so far.

Ok, thanks, good to know you’re watching this closely. I wonder if it might be worth keeping a small sample of your existing preliminary data (assuming that’s ok) in order to compare it to FactSet’s data and manage the latter accordingly.

Hi Marco,

Sorry - can you clarify that, pls.

This seems to imply that it is not PIT?

Or are you saying that there are 2 buckets and the 1st one is never overwritten (the 2nd one can be if there are several restatements) so it is more a matter of knowing on which date you look into the 1st vs the 2nd bucket.

Thank you

Jerome

As Reported is point in time, Most Recent is not. I think what Marco is saying is that As Reported includes preliminary and incomplete data sets that need to be handled in a rational fashion. This is complicated by the restatement of results after the fact.

Both buckets are overwritten. The “as-reported” bucket is first filled with whatever is on the prelim report, it is then updated with the SEC filing, and that’s were it stops. The “restated” bucket is created/updated any time a period is restated. Both buckets have reasonable dates that can be used to expose/hide them during a backtest.

Factset fundamentals can’t be called PIT for the reason that the data is overwritten. But I feel it will be more than ok, and we’ll use all our expertise to make it as good as it can be.

BTW… Factset provides some studies comparing “PIT vs lagged” analysis showing that it produces similar results. Might be self-serving, cherry picked studies, but I’ll post them soon.

OK - Thank you Marco for this more detailed explanation.

  1. I guess we will only know the magnitude of the differences once we get to run side by side a few sims with factset data vs S&P data (large caps, reasonable trading freq at first, as small caps might show greater variations).

Bottomline → -because of S&P behavior- we do not really have a choice in terms of moving to factset, it is about making the best of what is available. I have full trust in you and the P123 team.

  1. Question → could you guys keep your S&P code infrastructure somewhere ready to use? I am thinking long term if some of us are willing to pay for S&P data but using the P123 engine rather than Clarify. Would that be possible?

Thank you,

Jerome

Here is the FactSet page describing the advantages of their data.

Highlights:

  • 86,000+ global companies (including discontinued); 28,000 in North America alone vs 7,988 in CompuStat. (They aim to cover any new listing with $25m or more market cap.)
  • 750+ data items. (Do they count each quarter separately for each line item?)
  • The most recent annual and interim periods are available on the listing day. Earlier annual and interim periods are collected and made available within five business days of listing (Restated?)
  • They collect data from a variety of different public sources so that they won’t miss anything.

Questions for Marco and co:

  • Will short interest be available (for US stocks)?
  • Insider transactions?
  • Starting year for backtests?
  • When do estimates get updated?
  • How often will P123 refresh the data?

Jerome, yeah it’s a sunk cost. We’re open to running a version of p123 front-end and the ratio engine that uses your S&P license (or any other data for that matter). The great thing about this setup is that we’re no longer limited by strict redistribution agreements. You will have access to all of the data & history, no download restrictions, etc. Probably would take around 10-20 users to make sense for us.

Chipper

Working on short interest. We need permission from exchanges
Yes
20 years rolling
Not sure , probably very fluid like S&P
TBD. We’ll do updates every hours now to see just how much changes and when

Hey guys, just wanted to check in and see if there are any updates on this?

Also, is there any chance we will get the ability to backrest with daily rebalancing as opposed to weekly?

Also also, has anyone expressed interest in p123 front end and using our own s&p license? I’d be interested in that.

It’s ongoing.

Top priority is to reproduce what we have now with Factset (North America with weekly ranks). We are making changes to the engine of course, and daily ranking feasibility is something we’ll look at.

There’s been some interest, not much. Obviously there’s still plenty of time till Jun 2020. The place to start is with them, telling them you want to keep the p123 front end with their data. Email these guys joseph.smith (at) spglobal.com , david.coluccio (at) spglobal.com , dcamillo (at) spglobal.com and express your interest. Hopefully they won’t just try to get you to switch over to their tools :slight_smile:

Also wanted to make sure Short Interest data would still be available. Is this confirmed. Thanks!

+1 on short interest. It’s a powerful factor.

If this is a short interest vote then make that +1 from me as well.

+1 for short interest.
In case Factset doesn’t provide it (short interest is not in Factset data for Quantopian), it should be possible for P123 to get and integrate it for free here:
https://www.quandl.com/data/FINRA-Financial-Industry-Regulatory-Authority?keyword=aapl
(I haven’t tried it)

+1 for short interest, an excellent factor.