Fundemental Data II

Yuval,

What makes my ranking system so susceptible?

I guess you have looked at it.

It has VERY NORMAL VALUE FACTORS one being EBITDAQ/EV. And normal sentiment factors that I learned from Marc and have been posted numerous times over the years by many members.

Ultimately you may be right. My ranking system has only 6 COMPLETELY normal factors that everyone would recognize!!!

Only one buy and sell rule that basically ensures that there have been no recent analyst downgrades.

I would argue what I did worked to prevent overfitting. The sim continues to perform beyond my wildest dreams

So I think a lot of noise factors may drown out the signal-making the sim and port more closely aligned.

So I agree completely actually.

Take home points:

  1. Is supposed to do this

  2. P123 does not take a snapshot of all of the data.

Did I miss anything?

Thank you for expanding my knowledge of P123 and FactSet data.

Your comment are helpful and much appreciated.

Jim

Yuval,

As far as correlation is concerned, I performed PCA and factor analysis to reduce correlation between my nodes.

My sentiment and value nodes are completely uncorrelated.

Doing PCA and factor analysis stops me from adding a lot of noise factors which I say again: did prevent overfitting.

And one more question. The sentiment data is misleading in the sim no matter how many other factors I add, right? The misleading data just gets diluted by adding more factors. I don’t have to hire an AI expert for that right?

Extremely helpful. I did not fully understand FactSet’s earnings estimates data.

Honestly, I thought the lag recently added to that data had made it the most reliable data available at P123.

Jim

Yuval, please have a look at my question, thank you :-)))

Perhaps less is more? The difficulties we’re all having with prelims is why most research papers for factor analysis only use annual , final data that has been audited.

Preliminary data might just be completely useless for the type of analysis we do. During prelims data is incomplete, unstandardized, and can change. Some factors in your systems will fallback, others do not. Add to this that markets have knee jerk reactions creating value traps, and you have a complete mess during earnings. And having Compustat vs FactSet does not change the narrative much.

I’m not saying we’ll get rid of prelims. All I’m saying is that P123 should default to a much more stable mode. For Ex: when a user starts a new strategy it should default to avoid buying or selling during preliminary data (it’s possible to do this now but it’s a bit convoluted)

Marco,

Thank you very much for taking this seriously.

I do not use preliminary data: to the extent that checking the box to eliminate preliminary data completely eliminated it.

Yuval has just said earnings estimates are not at all reliable.

That being said is FactSet’s PIT offering of earnings estimates data not a possibility? Is it really PIT?

As it is, I think Yuval’s is right about the earnings estimate data.

That is why I have been importing Zacks rank data into P123 using InList.

Again, I am extremely grateful for the information you have provided and for addressing this in a serious manner.

Jim

FactSet estimates are quite reliable. Not sure what Yuval is referring to.

I apologize for being unclear and using the word “backfilling.” What I meant is that clearly something must have changed if Jim’s live strategy had a rank of 90.4 and the simulation had a rank of 39.2. Certainly from the data we have now it looks like the rank should have been 39.2. So FactSet must have done something to the data, but we don’t know what. Everything else in my post was a wild guess.

Yuval could have expressed it better.

But ultimately with regard to what he said he is correct all around isn’t he?

That was just 10/11/21.

Thank you Yuval for offering a plausible explanation for why the ranks are different.

I was investing over $250,000 on that port at one time when I thought it was credible (none now).

I did make money so no complaints.

Somthing is not right and I feel like a fool for taking any of it seriously with that many changes in the holdings and ranks.

As short as Yuval’s analysis was—and I prefer to use Zacks—it was an analysis.

Jim

Jrinne,

What time did you run the rebalance when you got the suspect ranks for AMRK & CRC? Can you give me the unique id or name of the live strategy?

I see you were doing rebalances on Saturday 10/13, Sunday 10/14 and Monday 10/15 starting at 4:30AM

Data is flowing in from Friday till Monday around 4AM. And after the fact it’s all lumped together as the “weekend update”.

Only rebalances after Monday 4Am should be trusted not to change.

Thanks

Marco,

I never do rebalances on Saturday, ever, with a live real port. I have been playing with InList from Zacks (which you cannot do with a sim) to see how many holdings it would have. I probably rebalance at all sorts or days and times with that.

I always check to make sure the data is up to date. I am never up at 4 am to do ports. My alarm goes off at 5:30 am except on surgery day.

If I I somehow made a weird error on some day that is just one day. I might have gotten busy and rebalanced on a Tuesday, like once. Would not be shocked if it was twice.

The port is 1Monday because I also ran it on (2)Tuesday, Wed, Thursday ETC and the number kept them in order. They do not do as well as the sim either, BTW.

It is in “Recent Archive” because I shut them all down (and put them into an archive). I do not know where to find the ID but would be happy to find it if you direct me.

Thank you for looking into this.

Jim

The lookahead bias that Jim pointed out to us a few months ago was corrected that week. We don’t know of any other lookahead bias besides the occasional cash flow numbers being filled in in prelims. This not an extremely common occurrence, since the large majority of companies announce and file on the same day, and some of those who announce early announce their cash flow along with their earnings.

The fact that it relies on only nine data points (I think). So if one of them shifts, it makes a huge difference in rank.

If you were using a ranking system like Core Combination or a variation thereof, there would be so many data points that if one or two shifted it would barely make a difference.

Yuval,

Edit: so I get your point if the goal is to match the sim and port. The image below was on unrelated subjects: the ideal number of factors for a port, noise factors, overfitting and optimizing a port to maximize profit. There is no reason to get into that here. I am sorry I cannot delete the image.

Thank you. You and Marco have been very helpful.

Jim


You are basically looking for changes in EPS estimates by comparing , for example, CurFYEPSMean with CurFYEPS4WkAgo

I plotted these for AMKR in the fundamental chart and maybe there’s something fishy with CurFYEPS4WkAgo around the rebalance date. The problem is further complicated because it’s around the same time NextFY become CurFY, but I don’t think that’s the issue.

It’s just a hunch right now. We’ll take a look again on how we calculate CurFYEPS4WkAgo to see if there’s any chance of it changing once the live value becomes part of the history. We do the date math using days, but FactSet timestamps these down to the second. Maybe when we go look for the value 4 weeks ago we’re off by a few hours due to a lower precision.

In any event this type of inconsistency should not invalidate your system in any way. The live portfolio may be looking at CurFYEPS4WkAgo but the simulation resolves to CurFYEPS4WkAgo give or take a few hours in some cases. That’s a meaningless difference. A live strategy with 100 positions should be very close to a simulation with 100 positions over the same period. But of course , if you compare a 5 stock live portfolio with a 5 stock simulation it could be completely different.

Makes sense?

Marco,

Thank you. Obviously, that would be ideal for me. And I appreciate what you have done.

You think the out-of-sample results of the sim are real then? The port performing at about 1/2 what the sim was doing (the excess return was roughly half) is just bad luck from random changes caused by an hour here or there 4 weeks ago.

Also I introduced some random changes like when I added or removed cash. I understand there will be some differences in the sim and port that are not a problem.

But long term just plug back in and I’ll be ready to start using summer as a verb in no time (e.g., I’ll be summering in the Hamptons).

I hope that is true. Absolutely hope that is true and I think it is possible although I would not bet on it here today. I will plug back in with some amount of money. Look at it again going forward.

I’ll show you the port and the sim in a few months no matter what happens with them, if you are interested.

Very, very much appreciated. I question my sim/port a little still but you looked at this and that is best information I have for now.

Best,

Jim

Jrinne,

I investigated AMKR & CRC a bit more. I think there are two different reasons why you can get different results between a live rebalance vs simulation.

Reason 1
Factset updates the estimate on Monday morning and all we know is that it happened on Monday (I was wrong before when I said FactSet gives us a timestamp). For the rebalance we expose that estimate of course. There’s no reason to hide any data from a rebalance. But when you run a simulation the Monday data is not exposed because we do not know if the data was available before the market opened or after. So it’s safer to not include Monday data rather than potentially introduce look-ahead. I believe this is what happened with CRC. See the estimate in the image on Monday (start_dt=2021-10-11). This estimate is being eliminated from the backtest.

Reason 2
Corrections from Factset. When you examine AMKR the highlighted estimate that lasted from 2021-09-01 to 2021-09-09 is very fishy. The value drops to 4.925 and jumps right back to previous value of 5.29 afterwards. The low of 3.8 in yellow is apparently from an analyst that was not included anymore in the aggregate (the # of analysts goes from 3 to 2) . It all points to some kind of correction that happened that I believe affected CurFYEPSMean4WkAgo

To “fix” these two sources of differences is not worth the effort. A robust backtest (with lots of positions) is still valid . For backtests with small number of positions (5 for example) , you need to introduce some randomness and run multiple times to get a better representation of what could have happened.

Hope this helps


Marco,

Excellent! This is very thorough and extremely well reasoned.

I also note that Yuval was quick to notice what was going on with AMKR. Thank you Yuval.

This is absolutely true, I believe. Random changes do not matter. Making the sim identical to the port is not the goal.

I (anyone) should only be concerned with systematic biases that favor the sim. Look-ahead bias is most often talked about but other biases are possible: constantly underestimating the slippage would be another example of a systematic bias. This underestimate would continuously favor the sim, as you know.

No problematic biases were identified here with AMKR and CRC which is what we were hoping to find.

With regard to your suggestion of adding randomness, I ran 5 ports: Monday, Tuesday, Wednesday, Thursday and Friday.

Running 5 ports is one way to introduce the randomness that you suggest. A rolling live port if you will.

So I guess I agree completely with everything you have said.

Thank you for looking for any systematic biases. I could not be happier that you did not find any.

Best,

Jim

Yuval, what is the cost to purchase the PIT Compustat data through P123?

You have to get a license from S&P, which used to cost around $12K to $15K. I’m not sure what they’re charging now.

Has there been any advancement in FactSet doing a better job with their data?
Future plans you can share?
Their backdating of data has killed the reliability of back testing when rules include fundamental data.