Data Miner: Is it possible to retrieve factor ranks using the Eval function without a data license?

Michael7 · February 13, 2022, 11:52am

Hi everyone,
The DataUniverse operation of the Data Miner allows you to retrieve PIT factor ranks for a given date range without a vendor data license. Now I have tried to use the following custom rank formula with an Eval function designed to return different ranks when NAs occur: Eval($EV2NTMSales!=NA, FRank(“$EV2NTMSales”,#Industry,#ASC), FRank(“$EV2NTMSales”,#ALL,#ASC))

My rank formula, just like any other rank formula that does not use the Eval function, returns only a rank and no raw data. However, when I try to run DataMiner with the above ranking formula, I get the following message: API request failed: Invalid formula Eval($EV2NTMSales!=NA,FRank(“$EV2NTMSales”,#Industry,#ASC),FRank(“$EV2NTMSales”,#ALL,#ASC)): A data license is required for this operation.

Does this mean that only ranks without an Eval function can be used without a data license? Or does anyone know a workaround for this kind of problem?

yuvaltaylor · February 14, 2022, 1:45am

Hi everyone,
The DataUniverse operation of the Data Miner allows you to retrieve PIT factor ranks for a given date range without a vendor data license. Now I have tried to use the following custom rank formula with an Eval function designed to return different ranks when NAs occur: Eval($EV2NTMSales!=NA, FRank(“$EV2NTMSales”,#Industry,#ASC), FRank(“$EV2NTMSales”,#ALL,#ASC))

My rank formula, just like any other rank formula that does not use the Eval function, returns only a rank and no raw data. However, when I try to run DataMiner with the above ranking formula, I get the following message: API request failed: Invalid formula Eval($EV2NTMSales!=NA,FRank(“$EV2NTMSales”,#Industry,#ASC),FRank(“$EV2NTMSales”,#ALL,#ASC)): A data license is required for this operation.

Does this mean that only ranks without an Eval function can be used without a data license? Or does anyone know a workaround for this kind of problem?

The Eval function in your example is calling up actual data with the NA specification.

Have you considered using the #ExclNA parameter? FRank(“$EV2NTMSales”,#Industry,#ASC,#ExclNA) might be a better solution. You’ll get an NA instead of a rank for companies with NA.

Michael7 · February 14, 2022, 2:49pm

Yuval, I am very grateful for your support. The reason I used the Eval function for my factor ranks was that I want to assign the same rank to all NAs in my stock universe, regardless of which industry the stock belongs to.

I’ve never used the EXclNA function before, so it`s good to know that there is such a parameter. However, in my particular case, I need a rank assigned to each NA, as I’m trying to get training data aiming to figure out the optimal weights of a whole set of factor ranks & their assigned composite ranks. But if this is not possible without a data license, I have no choice but to use the rank formulas without Eval function.

yuvaltaylor · February 14, 2022, 3:52pm

I understood your aim from your first post–to assign the same rank to all NAs in your stock universe.

Using the #ExclNA parameter should give you an NA rather than a rank assigned to it, but it should be very easy to search and replace your output and change all NAs to 0–or to 50.

At the moment, IsNA requires a data license. We’ll look into possibly changing that. But Eval will likely still require a data license.

Michael7 · February 15, 2022, 12:46pm

Yuval, thanks for clarifying that the Eval function cannot be applied without a data license. I appreciate your idea of replacing NAs with a fixed rank value. Now I think I will either, as you said, just assign a fixed rank value like 50 to each NA or just use all my industry-related rank formulas without the Eval function in the DataMiner and thus assign an industry based rank to each NA.

Handling NAs the right way is often quite tricky for me. So I would be very much interested in your opinion on how best to handle NAs especially for all those rank formulas where it makes sense to rank a factor within its industry rather than the universe as a whole. Or would you say that when creating a ranking system that includes a lot of different rank formulas, the entire NA issue and its impact on portfolio alpha is negligible because any potential distortions would be mostly diversified away?

yuvaltaylor · February 15, 2022, 9:20pm

Yuval, thanks for clarifying that the Eval function cannot be applied without a data license. I appreciate your idea of replacing NAs with a fixed rank value. Now I think I will either, as you said, just assign a fixed rank value like 50 to each NA or just use all my industry-related rank formulas without the Eval function in the DataMiner and thus assign an industry based rank to each NA.

Handling NAs the right way is often quite tricky for me. So I would be very much interested in your opinion on how best to handle NAs especially for all those rank formulas where it makes sense to rank a factor within its industry rather than the universe as a whole. Or would you say that when creating a ranking system that includes a lot of different rank formulas, the entire NA issue and its impact on portfolio alpha is negligible because any potential distortions would be mostly diversified away?

I have a few thoughts on this. But let’s take a concrete example: R&D to market cap.

Let’s say you want to look at RandDTTM/MktCap. Now Procter & Gamble (PG) has NA for RandDTTM. So does Philip Morris, AT&T, and Lockheed Martin. All of these companies spend a LOT of money on R&D, as evidenced by their numbers for RandDA. So right there is problem #1.

Here’s problem #2. Advance Auto Parts spends $0 on R&D. So does AutoZone. On the other hand, O’Reilly and CarMax spend N/A on R&D. Do you really want to make a distinction between companies with $0 and N/A? It’s pretty random which companies get assigned a $0 and which get assigned an N/A by our data providers. If you’re going to treat them differently, then your results aren’t going to make much sense.

Here’s problem #3. Trupanion, SelectQuote, and Waterdrop are the only companies in the insurance industry with any R&D spending. All the rest are N/A. Should N/As be neutral or negative in the insurance industry? Should those three companies be ranked higher than all other insurance companies? On the other side of the coin, there are ten companies (out of 74) in the semiconductor manufacturing industry with N/A for R&D spending. Should those ten companies be ranked lower than all the others? Or somewhere in the middle?

So you have to come up with creative solutions for N/As. Here’s one way to deal with N/As for RandDTTM/MktCap in a ranking node:

Eval (FCount (“RandDTTM = NA or RandDTTM = 0”, #industry) > 0.5 * FCount (“1”, #industry), NA, IsNA (RandDTTM, IsNA (RandDA, 0)) / MktCap).

Here’s what this formula does. It first looks to see whether more than half the companies in the industry have zero or NA R&D. If so, then all companies in that industry get N/As. If not, we try to find the TTM R&D number and divide that by market cap. If the TTM R&D is N/A, then we look at the annual R&D, and if that’s also N/A, we guess that it’s actually zero, not N/A. As a result, all companies in industries with very little R&D spending get N/As and are ranked in the middle and all companies in industries with quite a bit of R&D spending DON’T get N/As, and are ranked according to R&D-to-market cap.

In sum, N/As are very tricky beasts, and it’s worthwhile to spend some time looking at each N/A and what it actually might mean and how best to deal with it. It really varies from factor to factor. N/A for P/E, for example, usually means the company has negative net income, not that the earnings aren’t available.

Michael7 · February 16, 2022, 3:03pm

Yuval, thank you very much for your efforts to showcase these 3 important NA issues. All your examples help me to better understand this important topic. Most of all, I want to thank you for your very well thought out Eval formula, which I admittedly would never have been able to come up with on my own.

I agree that the handling of NA varies from factor to factor, but nevertheless I think your Eval formula is awesome and might be a gamechanger in mitigating at least some of the distortions caused by a wide range of NA issues. But as you also briefly pointed out, for another important NA problem area where factors with negative values play a major role (e.g. P/E ratios), a different solution is needed to deal with NAs. I’m convinced that handling NAs would be of great interest to many folks here at P123, so this might well be worthy of an article on your P123 blog at some point. Just saying.

Getting back to DataMiner, I’d be grateful if the P123 team would look into this matter further to see if there might not be a way to allow operations using such eval functions embedded in rank formulas without a data license. In my eyes, this would be a big step towards improving the data quality of the downloadable ranks, making DataMiner much more suitable for AI training.

Chipper6 · February 16, 2022, 11:53pm

[quote]
Eval (FCount (“RandDTTM = NA or RandDTTM = 0”, #industry) > 0.5 * FCount (“1”, #industry), NA, IsNA (RandDTTM, IsNA (RandDA, 0)) / MktCap).

Here’s what this formula does. It first looks to see whether more than half the companies in the industry have zero or NA R&D. If so, then all companies in that industry get N/As.
[/quote]Why not 0’s?

yuvaltaylor · February 17, 2022, 2:02am

[quote]

Good question. If I’m ranking all companies, no matter their industry, on a metric like R&D to market cap, I want the companies with the lowest ranks to be not insurance companies or banks, but companies whose ratio of R&D to market cap is relatively low. So I want the insurance companies and banks to be ranked in the middle and tech companies to be ranked near the top and bottom. Personally, I’ve found the R&D to market cap metric, when conceived of this way, to be even more useful for short systems than for long.

Michael7 · February 18, 2022, 11:06am

I just ran into a new problem with DataMiner. I have a custom rank-formula whose output I then try to rank in DataMiner using the FRank function (e.g. FRank(“$ROETTM_middle”,#ALL,#ASC). In the Screener, these types of formulas where two FRank functions are nested work absolutely fine, but in DataMiner I get the following error: API request failed: Could not find uid 8228 in function FRank – map size is 0.

Any ideas how to use such formulas with two nested rank functions in DataMiner?

danparquette · February 18, 2022, 5:46pm

Hi Michael,
Unfortunately Dataminer does not currently support nested FRank statements. I will create an enhancement request for this.

Michael7 · February 19, 2022, 10:10am

Hi Dan,
That would be great. I try to use the DataMiner to retrieve rank data for machine learning. In particular I would like to use it for reinforcement training to figure out the weights for a lot of different rank factors in terms of optimizing a portfolio’s alpha over different time periods. And in this context, many of the formulas I want to test make use of the Eval function or nested FRank functions.

So thank you very much in advance for your efforts to create an enhancement request for the nested FRank issue. Perhaps as part of such fixes it may be possible to also allow rank formulas with an embedded Eval function.

Finally, how can I know when changes have been made to DataMiner?

Have a nice weekend!

Michael7 · February 22, 2022, 2:32pm

Does anyone know where I can check if there are any changes or new enhancements to DataMiner (a kind of changelog or bugtracker for example)?

danparquette · February 22, 2022, 2:55pm

Hi Michael,
We announce new releases on the forum, but we usually don’t announce individual enhancements like this. I made a note in the enhancement request to notify you when this is complete.

Michael7 · February 22, 2022, 3:25pm

Oh, that sounds great. Thank you so much for your support. Improvements like this would make things much easier.

Have a great day!