I found this paper with almost the same title but for European data of Empirical Asset Pricing via Machine Learning. Some of the salient differences with Gu’s paper are the use of only 22 relatively straight forward features, and a dataset from 1999. At first glance it seems to show much better results than the USA counterpart.
We will be studying these papers for inspiration for our AI roadmap.
From the paper: " But machine learning models must be adequately trained and tuned to overcome
the high dimensionality issue and to avoid overfitting"
What is called “the curse of dimensionality” is another reason that one has to be carful adding too many variables to a ranking system. In addition to “degrees of freedom” discussed another thread…
I think it can be dealt with but it is a real issue.
Like degrees of freedom there is not a direct relationship between the number of factors and the “dimensions.” But for sure more factors means more dimensions.
Not a coincidence that the authors use only 22 factors I think.
There is significant difference in the last 1-year returns across countries:
max: Poland, +48.22%
min: Finland, -6.01%%
source: Countries | Seeking Alpha
A ‘fundamental country rank’ could be useful factor.
It can be computed currently in P123 using #GroupVar scope but it only work in Screen. It actually worked quite well for me in 2023.
I assume that we are separating "European" stocks from US stocks because the economies are very different, not just because the geography. I do find it very frustrating reading all these publications when they put like the Hungarian stock market in the same basket as the Norwegian, like there would be any resemblance between the economies and their stock markets? Here at P123 we even put Turkey in to the same basket.