Add new ranking method option for NA's: NA midpoint

Currently there are two choices in the ranker for NAs:

  • NA negative: NA’s are assigned a middle score
  • NA neutral: NA’s are assigned a score below the worst value

The NA midpoint address a problem with NA neutral when many values are NA. For example if 40% of the values are NA, the NAs would get a score slightly less than 40 which be too high. NA midpoint would instead assign it a value of 20, or halfway between 0 and 40.

I don’t think this is right. Let’s say you’re ranking the Easy to Trade US Universe on RandDTTM/SalesTTM. There are a lot of NAs there, so the rank for the NA stocks is around 66. Is this “too high”? If so, then just adjust it to 50. If you put it at 33, then it’s no longer neutral at all: it’s somewhat negative, with a lot more stocks ranking higher than 33 than lower than 33.

Never mind, I now see his feature does not exist yet. Roadmap

So I just have a question. I am not sure how it should be. But I actually thought NAs would load into the 50th percentile now for “NAs neutral.”

And if it isn’t doing that now, would you want to consider or discus that as an option since “50th percentile” might meet some definitions of “neutral” for NAs neutral.

Logic: if something is neutral you have no information. Anything above or below 50 would be introducing a type of bias for something you have no information about.

This bias could be a problem, I think, when using “lower is better” particularly. Which may be why switching to lower is better for factors that are inversely correlated with returns has never worked for me. Perhaps you are–at times–just selecting NA for which you have not information when you select lower is better.

I think it would be nice to make “lower is better” work without concern for NAs. That would effectively add a large number of potential factors for use in a ranking system. This could be done by putting NAs right in the middle.

TL;DR: Probably I am not understanding something but it does seem like "NAs neutral’ does not mean an NA is assigned the 50th percentile and I wonder if it should mean that or at least have that as an option.
And I wonder if that might increase the number of factors we can effectively use in ranking systems (dramatically). A question for now.

I do see the logic of what, I think, we are doing now too: i.e., just move all of the NAs out of the way so that as few as possible are used and then just address how they are ranked when you do have to use one. I do get that and would not want to change that option.

The rank performance test would look a lot prettier (with improved functionality for some situations) if every NA was just given value of 50. You would just have one messed up bucket and it might end up having a moderate value at that. Assuming I understand what you are doing now (which I probably do not).

Jim

Again, I probably am getting this wrong. I thought neutral meant neutral.

But if the NAs—as they are now—are messing up the ability to effectively use “lower is better” for factors that are inversely correlated then you might consider allowing negative weights. For normalization just normalize with the absolute value.

You would nearly double the useful factors. As an example SalesGr%TTM is inversely correlated with returns.

That was a quick look and this factors may not be the best example but allowing negative weights might be the easiest way to make use of a lot of new factors. Maybe we could code it with Eval() or using negative numbers ourselves however. Maybe I just needed to know how NAs are handled but I still think putting an NA as the 50th percentile has multiple advantages at once and using negatives and Eval() all the time would get old. Still, I can do that and possibly just me. Just an idea and not a feature request.

Jim

Jim , see “How Ranking Works” here for a detailed description

Marco,

Thank you. P123 works great for me now. So, not highly committed. In the example NAs were 33.3, I believe,

But if you ask me all NAs equal 50 better for soooooooo many reasons. And should be an option for a lot of good reasons, I believe. No matter which a person prefers. For example, lower is better does not work now (often) and now I know why.

And a stock could get a low rank just for NA even if it generally has good value factors (the ones not NA). I.e., a bias that is just randomly assigned.

Thanks for the clarification!!! I never would have thought it would be done that way. I did learn something and appreciate it. And like I said it works just the way it is --as long as I do not want to use lower is better for inversely correlated variables which has limited value—and I do not use now.

Also I understand why you did it the way you did. That has advantages and I get it.

Jim

Again, not committed. No change necessary. But sometimes ChatGPT get is right. And in this case thinks like I do:

Q: What is the best way to handle NAs in data?

A: …… 1. Mean/median imputation: In this method, missing values are replaced with the mean or median value of the corresponding variable. Imputation can help retain the sample size and maintain the overall distribution of the variable. However, it can also introduce bias and reduce variability in the data, especially if there are substantial missing values……

FWIW.

And honestly, I am just happy for now to have a correct understanding of what is meant by “neutral” in this context. I did think it meant mean or median, incorrectly.

Jim

Jim -

As it says in the document,

all NA values will, when the computation process begins, be put to the side. Percentile rankings from zero to 100 will be computed for all firms that have the necessary data. Then, at the end of the process, all NA companies will be assigned a rank in the middle of the valid ranks, a perfectly neutral score. The rank assigned to NA’s will usually be around 50, except when there are very few ranked stocks (such as in a small universe or industry).

That means that if there are 1000 stocks and 200 of them are NA, all the NAs will be ranked at 50. But if there are only 5 stocks and 2 of them are NA, all the NAs will be ranked at 33.33, since that will be the rank of the middle of the 3 rankable stocks. One stock will be ranked higher, one lower, and the middle one will rank precisely as high as all the NAs.

Another way to put this is that NAs are ranked the same as the median rankable stock.

Yuval,

Perfect! Much appreciated. I think the example of 33.3 for neutral confused me. But my fault for skimming as the document is clear when not skimmed.

Thank you Macro and Yuval for the clarification.

Jim

NA’s in ranking have always confounded me…

If NA’s are neutral that means a stock with an NA will rank higher than another stock below the median.

If NA’s are negative than the stock is assigned a rank at the bottom, when in fact it should have No Rank. This logic doesn’t make sense… Meaning that when ranking multiple nodes this bottom ranking is given a rank, where there should be none.

Seems to me that no rank when NA makes the most sense when ranking on multiple nodes.

I feel if anything using NA = neutral or negative in its current form is already fairly punitive to stocks with alot of NA’s. If anything I have often wondered how the backtesting would perform if NA’s received a value that was higher than the median ranking. When you think about if you are only buying stocks with rank>98 or something similar chances of a stock with alot of NA’s getting bought is usually low. I could be wrong but I believe the NA’s are mainly a result of the time delay from when Factset loads the complete data after earning reports. In the occurrence’s where a stock is showing NA’s because they have no earnings, chances are there ranking wont be high enough to be bought by most systems.

Is there any way to have an option to use “last available” non-NA ranking data in the NA rank case?

What might be interesting is to have NA criteria customizable via a function or setting. Or even more advanced, have it customizable depending on other criteria within a factor. Sounds complicated.

Doesn’t the built-in fall-back mechanism all ready do that?

Not for the aggregate rank?

I’m not sure I follow.

Aggregate rank is constructed from individual factors and those factors have a built-in fall-back mechanism. No?

The question of how to handle NAs is interesting.

P123 doesn’t try to impute missing values (outside of fall-back), instead, for a given factor, it collects them all into one bucket and ranks them as either neutral or negative. So one refinement could be imputing values. I’ve tried that by substituting the industry median value for NAs; e.g.

isNA(RandD(0,QTR,KeepNA),FMedian("RandD(0,QTR,KeepNA)",#Industry),RandD(0,QTR,KeepNA))/Sales(0,QTR)

However, I didn’t try that universally and my ranking systems are large enough that’s it’s difficult to determine if that strategy is effective.

1 Like

There are a lot of excellent ideas for “fallbacks.” As I read these great ideas it occurs to me that you can use anything that is unlikely to be NA and be correlated to the factor you are interested in as a fallback The more correlated the better and that is much of what is being discussed or considered perhaps.

I would come up with an example—to try to support my point— but I think there are good examples of what I am talking about above!

Yes, that would work , you can take care of all the rank components individually.
I was thinking in terms of a global fallback on all rank components. Maybe to hard to implement…