https://onlinelibrary.wiley.com/doi/full/10.1111/joes.12532
Interesting:
Despite the recent noticeable efforts to apply new methods in Asset Pricing, a major insight that transpires from this emerging literature is that the factor zoo issue is still unresolved. Every method we review detects indeed different groups of prominent factors for the cross-section and at this stage, it is complicated to tell which approach to prefer, especially considering that several well-known anomalies has become less relevant over time (Chordia et al., 2014) and that the predictive power of some characteristics varies after conditioning on the others (Freyberger et al., 2020). Notwithstanding, some factors tend to stick out more often than others, namely past returns (e.g., STR and momentum), liquidity factors (e.g., Pástor & Stambaugh, 2003), and trading frictions (e.g., SUV). Pinning down a small set of risk sources robust to various identification algorithms ultimately ameliorates the comprehension of what drives returns and provides an excellent starting point for further research. In particular, this information can assist researchers in the question about the sparsity/density of the SDF, which remains open. While majority of the papers find redundancies in the factor zoo (e.g., Feng et al., 2020), others claim that one needs to consider all anomalies to achieve good performance (Kozak et al., 2020). The point is far-reaching as it guides researchers’ views in the process of building theoretical models to explain these empirical facts.
A second notable finding is that nonlinearities matter. Interactions among covariates play a big role in several studies, and matter more than nonlinearities in single characteristics (Bianchi et al., 2021), although not everybody agrees (Kozak et al., 2020).
Finally, new patterns have been discovered in specific applications thanks to ML. For example, high-frequency stock returns are largely driven by industry factors in contrast to traditional characteristics-based factors (Pelger, 2020). Returns of assets traditionally difficult to describe with factor models, like options, are easily captured by few economically interpretable factors (Giglio et al, 2022). Firm characteristics and first moments of returns contain valuable information to build low-dimensional but highly efficient statistical factor representations (Kelly et al., 2019; Lettau & Pelger, 2020b).
From the conclution:
Risk of overfitting the data and difficult interpretation of the procedures employed are the price to pay for the flexibility and the performance of ML methods. Common grounds for data sample, evaluation metrics, and tools to identify the contributions of characteristics to expected returns are vital for future research.