FactSet beta site v1.0, NOW LIVE

atw · May 5, 2020, 5:52pm

Hi Aaron and Yuval,

Suspecting there might still be issues involving data on what the Beta site now refers to as the Legacy (meaning old) site vs. the Current (meaning Beta) site w/preliminary data, I screened 20 variables on 6 stocks for one day - the data available this morning - and manually checked for differences.

You will immediately recognize what an unsatisfactory method of investigation and insufficient sample size this is.

Nonetheless, even in a sample size of just six tickers, 20 variables and one day, I found 25% of the variables included at least one difference in value. ALL of the stocks included at least one difference as well.

For ticker AACG, MScoreSGAI is 0.15 in the Beta vs. 0.11 in the Legacy.
For ticker AABVG, WeeksIntoQ is 20 in the Beta vs. 16 in the Legacy.
For ticker AACG, WeeksIntoQ is 24 in the Beta vs. 23 in the Legacy.
For ticker AACTF, WeeksIntoQ is 9 in the Beta vs. 8 in the Legacy.
For ticker A, WeeksToQ is 2 in the Beta vs. 1 in the Legacy.
For AA, WeeksToQ is 11 in the Beta vs. 10 in the Legacy.
For AABVF, WeeksToQ is N/A in the Beta vs. 2 in the Legacy.
For AACH, WeeksToQ is N/A in the Beta vs. 0 in the Legacy.
For AACTF, WeeksToQ is 3 in the Beta vs. 9 in the Legacy.
For A, WeekstoY is N/A in the Beta vs. 29 in the Legacy.
For AA, WeekstoY is N/A in the Beta vs. 36 in the Legacy.
For AABVF, WeekstoY is 0 in the Beta vs. 2 in the Legacy.
For AACG, WeekstoY is 1 in the Beta vs. N/A in the Legacy.
For AACH, WeekstoY is N/A in the Beta vs. 0 in the Legacy.
For AACTF, WeekstoY is N/A in the Beta vs. 10 in the Legacy.
For AA, Inst%OwnInd is 19.77 in the Beta vs. 19.82 in the Legacy.

I think every p123 subscriber should be very careful about drawing conclusions about the reliability of models built on CompuStat data vs. FactSet data until issues related to CompuStat data on the Beta site vs. CompuStat data on the Legacy (current) site are better understood and/or ironed out.

Hugh

pdemartino · May 6, 2020, 5:01pm

General request for feedback: Has anyone noticed anything particularly wrong with utility companies?

This is probably just a personal worry. Reuters used to give a differing presentation with utility companies and I have feared that FactSet might as well. I have not noticed anything in either the documentation or results to warrant the fret, so I’m hoping to put this concern to rest, unless anyone has noticed anything concrete.

Chipper6 · May 6, 2020, 7:03pm

Simulations for my utilities system are almost exactly the same. Is there anything in particular that you are concerned about?

geov · May 6, 2020, 7:57pm

On beta server hedging does not work when hedge entry and exit rules are custom formulas.

returns “UnhandledExceptionHandler: EXCEPTION_ACCESS_VIOLATION”

works fine with Server Version Legacy.

pdemartino · May 6, 2020, 9:52pm

Nothing. As I said, I just wanted to tick utilities off as something to worry about. It’s a pleasant surprise that they’re fine out of the box.

danparquette · May 7, 2020, 2:59am

Hi Paul,
Both utilities and financial services are broken out as special cases in Compustat. I did some quick tests where I pulled some balance sheet factors for the stocks in the Util and Financial sectors.
The screen reports are limited to only 25 factors, so I could only do 25. I used TTM for all of them.

The results for utilities dont look bad. I listed some things below that you should probably look into:
The 2 factors below had about 30 more N/As in FS vs FS (UTIL only has 131 stocks, so that is a high %)
GrossPlantTTM, TxPayableTTM
See columns Q and AB in the attached Util spreadsheet.
The 5 factors below had a lot of cases where the CS and FS values were different by more than 100%.
AstCurOtherTTM,AstNonCurOtherTTM,IntanOtherTTM,InvstAdvOtherTTM,TxDfdICTTM
See columns BN, BX,BZ,CA,CH in the attached Util spreadsheet.

But the results for financials are very different in CS vs FS. I’m not going to try to list all the differences here. Take a look at the attached spreadsheet.
I split the screen so that you could always see the Sector/Ind data in Columns A,B,C because that is important because the major diffs are related to certain industries.
Columns H to AG is the Compustat data. AJ to BK is FactSet.
Columns BN to CL show the difference between the CS and FS values as a % for each stock and each factor. Many factors have a lot of diffs between CS and FS, but the worst ones are highlighted in red.
The cells in cells H4 to AG9 compare the number of NAs in CS to the number in FS for each factor. There are major differences. I highlighted the worst factors in red.
One thing that is common is for CS to have a value of 0 and FS to have N/A. 0 and NA are not the same thing, so those should also be looked into.

If anything I mentioned above does end up being something that requires fixing, then you would want to run a lot more tests (mainly for financials) since I only ran 25 balance sheet factors in my test. Items from the other statements could have issues too. Also, if I was going to pick another place to go fishing, I would recommend that you take a look at ADRs if you have not tested them already.

CsvsFactSet_BalanceSheetSet_FIN.xlsx (749 KB)

CsvsFactSet_BalanceSheetSet_Util.xlsx (125 KB)

test_user · May 7, 2020, 10:19am

Just a little heads up: Using Showvar() with Factset data I am getting a lot of N/A’s for NetIncCFStmt(), NetIncCFStmtQ, NetIncCFStmtTTM, and so on.

Chipper6 · May 7, 2020, 2:21pm

My utilities system has been very very simple; it may not have been affected by whatever it is that you are concerned about.

Ju · May 8, 2020, 8:28am

Hello,

“NetIncCFStmtPY<0” does not seem to work with Factset.

rogerkjames · May 8, 2020, 9:24am

I’m coming into this whole thing rather late, so excuse me if I’m repeating an earlier point.
I do quite a bit with ETF’s and one of my models is showing an incredible difference in performance. (0% winners instead of 60%!!)
A preliminary comparison has brought two points to light :

The stop’loss is hyperactive. Example - TQQQ bought at $64.21 with a 15% stop/loss setting , actually triggered a sell at $60.64, which is about 9.45%
If a stop/loss is triggered for one ETF, the whole portfolio is sold!!

Thank God for the Legacy setting!

mprokopec1 · May 8, 2020, 10:18am

I run some simulations this morning with new engine using Price for Transactions “Previous Close”. For entries it looks it works ok (prices are closing prices of the day when Buy Rules are met), but for exits using for example Down Close as exit “Close(0)<Open(0)” the backtest takes price a day before Sell Rules were met (in this case down close). It looks like this applies only to Sell Rules, Buy Rules look fine. As a result, the backtest is extremely profitable as using for example EMA exit: EMA(4)<EMA(4,1) the simulation assumes you exited a day before the condition was met (at peak prior to EMA turn down).

RTNL · May 10, 2020, 6:37pm

I am sorry, i am late to the party here. But i notice that my simulations are showing different results now.

Is it because of the switch?

How do i go back and use the legacy database?

yuvaltaylor · May 11, 2020, 3:30am

Click on the circular image in the top right of your screen. You should see the option to use the current or the legacy database.

RTNL · May 11, 2020, 4:59am

I see, thanks. I am set to legacy, but it still seems that the new results for the same sim are different.

Furthermore, i had some XL output of an old ranking system. When I rerun setting it to the past date, I see slightly different results.

Sounds really strange.

Nicoletta · May 11, 2020, 9:34pm

Hi Marco/Yuval,
first of all wish you could answer to my previous thread here …. https://www.portfolio123.com/mvnforum/viewthread_thread,12266_offset,20 such a difference in result due to DB change seems to me too big.

As regard to the thread FactSet beta site v1.0, NOW LIVE I try to explain what I have understood, even for other PF123 users, because I think I am not the only one who need to better understand …

Compusat Legacy= is the engine used till now (but if so why a simple re run of some Sim has changed the annual return?)
Compusat Current= I do not understand what is it and the reason of it. Compusat as a supplier is not going to be dismiss?
Marco said:
“For Compustat, “Use Prelim” reflects what we have now. Preliminary announcements are exposed when Compustat processes them. “Exclude Prelim” enables you to completely exclude preliminary announcements so that only data from complete statements from filing is exposed. NOTE: this is a brand new feature that did not exist before.”
Ok, so as far I understand till now we had Compusat Legacy + Use Prelim as standard.
Compusat Current+Exclude should be a new option, but if Compusat is going to be halted what’s the point?
Marco says: For FactSet, it’s a similar behavior from March 2020 onwards. Francly I did not see differences in Sim behavior till last week…The choose between prelim and exclude is quite obvious…

When everything (bags) will be settled which will be the possibility left to PF123 user? Factset +Prelim and Facset+Exclude? Frankly I have not understood yet.
Thanks for your kind answer.
Fabio

aschiff · May 12, 2020, 5:41pm

Fabio:
The reason we are offering “Compustat” and “Use Prelim” is to allow users to compare the “Legacy” server and the “Current” server.
The reason we are offering “Compustat” and “Exclude Prelim” is to allow users to better compare historical data between Compustat and FactSet on the same server while we are still able to offer Compustat data. This is the best option for comparing them because FactSet replaces period data as new data becomes available, so we’ll only be able to expose final data for historical periods. That is to say, changing between “Use Prelim” and “Exclude Prelim” will make no difference on historical data when using FactSet. (Correction: “Use Prelim” will expose final data on the press release date, and “Exclude Prelim” will hide the data until the final date.) Note that we are capturing preliminary data for new periods as they made available in the FactSet dataset; this will allow us to offer preliminary data for these periods moving forward.
After the Compustat license expires in June, we will no longer offer the option to use Compustat on the standard platform.

yuvaltaylor · May 13, 2020, 3:42am

I want to quickly address concerns about simulations getting different results at different times and on different servers.

First of all, as long-time P123 users can attest, if you run a simulation in April and then run exactly the same simulation in May, you’ll probably get slightly different results, even if the beginning and end dates are the same. There are many reasons for this, but one is that data providers introduce new stocks into the database (or convert a stock from “All Stocks” to “All Fundamentals”), and that introduction will change the ranking of all other stocks. Another is that occasionally data providers make mistakes and then silently correct them (or make new ones) in historical data.

Second, in another thread I made a list of what is different between the legacy server and the current server using Compustat and “Use Prelims.” I’m pasting the list below. These changes will account for a large number of differences.

Third, we are still working on adjusting some FactSet line items to conform better to Compustat’s line items, and in addition there are still a number of bugs in the FactSet engine that have not yet been fixed.

If your sim chooses five or ten stocks at a time out of thousands, your results could be very significantly different from one run to another, from one engine to another, and from one database to another. There’s no way to figure out why, though, without analyzing all the differences one by one.

Here, again, is the list of differences between the Legacy engine and the current version of Compustat using prelims.

Shares now returns NA when “Shares Outstanding” is unavailable instead of returning a value from SharesBasic or SharesFD.

Skip preliminary fallback if numerator is present and denominator is invalid (0 or negative) in preliminary: DbtS2NIQ, DbtS2NITTM, GMgn%, NPMgn%, PMgn5YCGr%, TxRate%.

AnnounceDaysPYQ, AnnounceDaysQ, WeeksIntoQ, WeeksToQ, WeeksToY use additional dates.

Div%ChgA, Div3YCGr%, and Div5YCGr% are -100 instead of NA or 0 when going from positive to zero.

Preliminary fallbacks have been revised for these factors: EV2EBITDA, ValROETTM, Pr2BookPQ.

WalterW · May 13, 2020, 7:33pm

I’m not seeing that behavior now. Simply switching between FactSet/Use and FactSet/Exclude produced different simulation results. Is that expected w/ the current state of the server?

I read that as FactSet is not PIT w/r/t preliminary data. But my real concern is that capturing data like that will make the preliminary data pre vs post FactSet deployment look different. So in a year or two, if I develop with recent data held out, the validation simulations will see a somewhat different type of DB. Will there be a way to ignore that captured data?

Walter

yuvaltaylor · May 13, 2020, 9:26pm

I’m quoting from Marco here, who wrote on page one of this thread:

Luckily Factset does give us two key dates for past filings: when the preliminary data was first announced (“News Date”) and when it was filed with the SEC (“File Date”). When you choose “Use Prelim” we expose the final data on the News Date + one day. This means that some data points may suffer from look-ahead biases. Depending your strategy, this may not matter much since many important items are usually always reported in press releases.

If you want a more conservative backtest approach you can choose to “Exclude Prelims” which will expose the data on the File Date + one day. NOTE: this may still suffer from a minor look-ahead bias since Factset may have taken a few days , or even weeks , to process the filing. We hope to revisit this later when we learn more about typical delays.

So there will be a difference in simulations because the two different FactSet options expose the SAME data on DIFFERENT dates.

No, I don’t see how that would be possible.

WalterW · May 13, 2020, 10:18pm

Yuval, Thanks for clearing up the first issue for me. While I expect the capturing of preliminary data changes to be useful, there really won’t be anyway to confirm that usefulness. I see that as a policy decision and not a technology issue.

Walter