'Clustered' ranking output

Greetings all,

I’ve been playing around with Portfolio123 the last few days and noticed that for some ranking systems I get a ‘clustered’ output, where the median bars are either low or non existant and the top and bottom bars have high returns (see attachment). Currently I disregard these type of factors and move on to create other ones, but I was wondering if someone else has another way of approaching things.

How would I interpret these results and is there a way to benefit from these types of factors that I am missing?

Best,

Victor


This happens if you use the “Percentile NAs Neutral” ranking method and you use Universe as your comparison. The NA stocks are all bunched together in the middle so there’s no way to separate their returns into bars. If you change the ranking method you’ll notice the same thing happens on the left rather than in the middle.

If you’re seeing these results without using the “Percentile NAs Neutral” ranking method and Universe as your comparison, please post below the factor and performance settings and I’ll do my best to explain.

Below is a good example of a clustered result without “Percile NAs Neutral”

It is the ranking output you get when you use the $CYDays formula in a ranking system (P123 misc formula) with a broad universe like Easy to Trade US.

Based on the output it seems it is best to invest in companies which have either recently ended their fiscal year or are about to end it, which doesn’t seem to make much sense. I’m interested to hear other explanations.

Yuval,

Thank you for the clear explanation. I believe I get this (and have know this for a while).

But I am not sure what you mean by “universe as comparison.” I only ask think there may be a feature or technique that I may have missed.

Best,

Jim

Victor:

It is the ranking output you get when you use the $CYDays formula in a ranking system (P123 misc formula) with a broad universe like Easy to Trade US.

Based on the output it seems it is best to invest in companies which have either recently ended their fiscal year or are about to end it, which doesn’t seem to make much sense. I’m interested to hear other explanations.

Here’s a shot at an explanation.

The large majority of companies in the Easy to Trade US universe have fiscal years that end on December 31. All those companies will be equally ranked on this factor. When a majority of companies are equally ranked, their results are all in the same bucket, leaving other buckets empty. So the chart does not reflect an equal number of stocks in each bucket.

I suggest you run this again on a discrete time period of about a year (not necessarily this year, but any year) and check “Yes” by “Save Log.” Then look at the log. You’ll see the crazy stuff that’s happening here. Most of the middle buckets will be empty most of the time.

Jim:

I meant using “Universe” instead of “Sector” or “Industry” as your comparison in a particular node in your ranking system.

Yuval, Thanks.-Jim

I have seem some machine learning methods where VERY SMALL RANDOM positive or negative numbers are added to each data point.

It is likely that–were this done–each bucket would have the same number of stocks.

It could be an option (that can be turned off or on) for each factor in a rankin system.

Just an idea that I have not fully explored and assumes some factors are “crazy” enough to be considered a problem worth addressing.

Brilliant move, Jim!

You can do this with the Random command. So if you use $CYDays + Random as your factor, you won’t get any empty buckets, since every stock’s number of days since the last fiscal year will be augmented by a random number between 0 and 1. The end result is a bar chart that makes sense:

Yuval,

Thank you for providing an example. I had not actually tried it myself.

It does work with in this example with random. Most people have probably thought about this but sometimes you need a smaller interval than [0,1) which may be the interval for random (or perhaps [0,1]).

An example where a smaller number might be needed is a sentiment factor like this: (NextFYEPSMean-NextFYEPS4WkAgo)/Abs(NextFYEPS4WkAgo)

This factor will be less than 1 most of the time and adding a large random number (which will be different for each stock and different each time) could switch a rank position randomly. Switching of rank positions is not desirable—a problem actually.

BTW, there are probably better ways to do it (e.g., Ridge Regression or Lasso regression. Formal L1 and L2 regularization in other words) but maybe this could be used for regularization in rare instances. This would actually be a controlled switching of rank positions and would be an exception to the above paragraph and is the reason for putting this paragraph here. Generally, I am not going to use that soon but I do not discount the idea that it could be useful if someone were to suggest using it for that purpose.

People should routinely divide random() by a large number when using this. Something like random()/1000, I think. I have not tried this yet and I do not know how large of a number to use in the denominator but larger is probably better.

An aside: Is this a big deal for people using slope to evaluate their factors? Just a question for now. But also is it a big deal for other ways of evaluating individual rank factors? I will need to go back through my ranking system. This will give me something to do at the start of the New Year. Maybe something that might yield a significant improvement in my ranking system. Which is yet to be determined.

In any case, thank you for the feedback Yuval.

Jim