I haven't read the book in its entirety, but I have researched the three approaches mentioned. You are correct: the feature importance shown in Portfolio123 is MDI. I want to highlight a few critical points regarding those results.
If you have several features providing the same type of information, the importance can be disguised or inflated in the MDI output. This usually happens in one of two ways:
-
Dilution: The importance is split between the correlated features, making the underlying information appear less relevant to the model than it actually is.
-
Inflation/Overfitting: Worse yet, the model may place correlated features in the same decision tree, focusing on the extreme differences between them. For example, if you use
ROC(150)andROC(170)as features, the model might fixate on the performance specifically between day 150 and day 170—information that is likely just noise.
Since we are discussing correlated features, it's worth noting that a simple MDA (removing one feature at a time to measure OOS impact) can also be misleading. If you remove ROC(150), the ROC(170) feature will likely "take over" the predictive work, and the performance won't drop. While this correctly identifies the individual feature as redundant, you might mistakenly conclude that momentum as a style is not important to the model.
To truly see the impact of a feature “style,” you would need to identify all features within that same style, remove the entire group, and then run a performance test. I might actually add a Dendrogram to my Python correlation program to help visualize these "Information Clusters."