Survivorship bias in our data?

I did drift analysis of some AI Features and I came across some concerning data.

This is a Drift analysis of the Feature #Age, and you can see a very clear drift in the data from on average young companies back in 2003, and the average gets older over the year as well as the spread of age increases. Should we be concerned?

The Universe I used for this analysis is 2000 companies from North America and West Europe with high FCF/EV. I have not tried other Universes yet. If no one as a simple explanation of this drift I will dig deeper in to different markets/universes.

2 Likes

Would reduced IPOs not increase the age of existing firms over time? I know there has been an increase in private firm trading in the secondary market for us accredited investors reducing the need to IPO at all to avoid costs and frivolous lawsuits. I have been able to invest in space x for example but it has never been listed.

There are now two markets. One for accredited investors and one for everyone else thanks to frivolous lawsuits and regulations. It is not uncommon for me to look up news for a ticker and see a number of lawsuits in the feed. It is also why many firms are moving to Texas to escape the onslaught.

If company age is determined by #APeriods, then this is a chart of the Easy to Trade North Atlantic universe, using UnivMedian:


Why is this happening? It's not because the IPOs dropped, as Santiago suggested. It's because FactSet's data only goes back to 1980. The oldest companies in 1999 were only 19 years old.

2 Likes

Thanks for sharing- makes sense. I think if one counts the age from company inception the actual median age of listed firms is also up though separate from the data so that would presumably add to the effect here to add an extra (smaller) driver.

Ok, that would explain it. So to test this theory I would need to filter out all companies that where 19 years old in 1999, because these companies would be the ones creating the drift.

Do you have a method to filter out these companies?

FRank("#APeriods")<95 should do it. It's kind of crude . . . And eliminating those companies wouldn't eliminate the drift at all. The only way to eliminate the drift would be to include all those companies but assign them their actual age.

Since you're working with a much smaller universe, you could run a screen with an as-of-date in 1999, take the companies that have #APeriods = 19, and assign each of them an imported stock factor with their actual age. You could then use a data point like IsNA (##ActualAge, #APeriods), if you call your imported stock factor ##ActualAge. It might not be worth the trouble, though.