I really wish factor momentum would work - help me show that it actually does

Edit: Just to put this into perspective, I used my KNN method on 31 factors and compared each to 100 randomized models. No randomized model beat my model with the any of the target factors used. Suggesting a p-value < 0.0003 for the KNN model I am using having some predictive power. Maybe just me but no small thing, I think.

You could also assume there is a 50/50 chance that my method would work with any single factor. Then there would be a one in 2^31 chance that my model would work for every factor tried. Or you would expect that result occur 1 in 2,147,483,648 times by chance alone (with no predictive power for the model). It's like getting heads 31 times in a row with a fair coin by chance alone to use another non-parametric statistical method, if you like statistics but question the first method.

For a visualization of this method here is the permuted or randomized results compared to the actual results of one facto. The permutations could be normally distributed. If so about a 3 sigma event for just one factor I would say just eyeballing the results:

TL;DR: I agree with everyone's post above. But maybe whom I agree with depends on the feature and the period. And I wonder why I would have to rely on just one piece of data (or one predictor). Why would I do that?

My opinion based on some data (see above) is that all of the above can be true depending on the feature and period used. That you can have momentum or mean reversion and that it may or may not be a large effect and may or may not be statistically significant

Yuval adds that is it possible to see momentum over longer periods but mean-reversion over shorter periods. True for sure and this is commonly used in the literature which supports his idea. An important contribution to this discussion, I believe.

Why assume a priori it will be one or the other. And if there is momentum over the longer period but mean reversion over the shorter period why not include both in your predictive model?

Do you assume all factors will behave the same over all periods? If not, what model do you use to begin to discriminate among features? Whatever you decide to do will you want to do the same thing for every feature?

So whatever one believes about a particular factor how do you use what you believe? And if you are going to use a model which one will you use? I used linear regression above and I think that works pretty good if you are going to use just one piece of data for predictions. But there are other ways to do it.

What if you wanted to use the idea that there is a trend long-term but mean reversion short term? MAYBE YOU BELIEVE THERE IS AN INTERACTION BETWEEN THESE 2 FEATURES. And maybe some other data too? Maybe there is macro data that is correlsted with you factor's performance (like interest rate changes) and you want to incorporate that?

What model do you use for any of that. Here is how K-nearest neighbors (KNN) does with one target feature in the Core Ranking System and importantly multple predictors:

P-value is done in a unique way. The model is run with all of the factors I decided to use and a R^2 value obtained. Then the model was run multiple times with all of the features shuffled as is done here in Sklearn: Permutation feature importance. In this example there were 100 runs with the features shuffled (reshuffled with each run) and none of them beat the model with real data suggesting the data was helpful for making predictions 100 times out of 100 runs..

If you have more time you can use more permutations. And before you fund it you could consider using Permutation feature importance to help you decide which features have the most predictive power and maybe remove a few features that seem to be just noise.

My grid search of a relatively small amount of data would confirm 4 weeks has a strong effect if you are considering both mean reversion and trending.. The above KNN model uses 4 weeks for the predictors for that reason. Confirmation of what you have said, I believe.

BTW, I am finding statistical significance (as determined by the feature permutation method above) for every feature I have tested with KNN so far. Suggesting everyone's ideas can be incorporated into one objective model using multiple features to do so. It would not have to be KNN but I have a fondness for it as it's actually the simplest ML model, does not assume the data is normally distributed, uses interactions for predictions and, when keeping the number of features small, hard to beat.

JIm