AI models on ensebling

This frustrates me about AI - which has become an incredibly powerful tool. I have a LightGBM algorithm that I am pleased with, but I also wanted to test whether it is possible to combine it with others to strengthen my algorithm and at the same time remove weaknesses. So I asked 6 different deep thinking models, and what do I get?! Good answers, but different answers.

Often, I end up choosing the answers from the most robust models or choosing answers that most of the models agree on.

Here are the answers I received from the various thinking models:

Detailed Schematic Overview of Each Analysis's Recommendation

Analysis / Source Recommended Partner Detailed Theoretical Rationale for the Choice Rationale for Discarding Others
My Original Answer linear - Maximum Structural Difference: Combines a complex, non-linear tree-based model (LightGBM) with a simple, global linear model. They operate in fundamentally different function spaces.
- Optimal Bias-Variance Tradeoff: Pairs LightGBM's low bias and high variance with the linear model's high bias and low variance. This creates a highly robust ensemble model that is less prone to overfitting.
- Highest Decorrelation: Because the models make fundamentally different assumptions about the data, their errors will be the least correlated, providing maximum diversification benefits.
- XGBoost: Too similar. Both are gradient boosting models, leading to high correlation.
- RandomForest: Better, but still tree-based. Partial correlation since they capture similar non-linear patterns.
Analysis 1 randomforest - Boosting vs. Bagging: LightGBM builds trees sequentially to correct errors (boosting). RandomForest builds trees independently and in parallel on random subsets (bagging). This methodological difference is a core reason for decorrelation.
- Balanced Bias-Variance: LightGBM's low bias/high variance is stabilized by RandomForest's slightly higher bias/low variance. RandomForest is robust to noise, which is a weakness of boosting.
- Practical Robustness: RandomForest handles non-stationary and high-dimensional financial data well, providing a stable partner to LightGBM's signal hunting.
- XGBoost: Too similar (both are boosting).
- Linear Model: Too simple, fails to capture the critical non-linear patterns in the stock market.
- Neural Networks (NN): "Black box," requires a lot of data, and can also overfit to noise.
Analysis 2 linear - Structural Orthogonality: LightGBM captures local, non-linear interactions. Linear regression only captures global, linear trends. They are "orthogonal" in their approach.
- Complementarity in Bias-Variance: Perfectly complementary. LightGBM is a "high alpha, high volatility" strategy, while linear is a "stable beta, low volatility" component. Together, they lower total variance.
- Regime Robustness: The ensemble becomes robust across market regimes. Linear performs well in trending markets, while LightGBM can capture more complex patterns in other periods.
- XGBoost/RandomForest: Too high correlation (ρ > 0.85 for XGBoost, ρ ≈ 0.65-0.75 for RF).
- NN: Unnecessary complexity when a simpler solution (linear) provides better decorrelation. High-variance + high-variance is risky.
- SVM: Scales poorly in practice for quantitative finance.
Analysis 3 gam - Complementary Non-Linearity: GAM models non-linearity as a sum of smooth functions of each individual variable (additive), whereas LightGBM specializes in finding complex interactions between variables. They thus capture different types of non-linear patterns.
- Balanced Complexity: GAM is a low-variance, higher-bias model that is more sophisticated than a pure linear model but avoids the extreme complexity of LightGBM. This provides a good middle ground.
- Decorrelated Errors: Because the learning mechanisms are so different (additive vs. interactive), their errors often become decorrelated.
- XGBoost/RF: Too similar in structure and error correlation.
- NN: Often high-variance and a "black box," which can increase correlation with LightGBM.
- Linear Model: Too simple. Loses too much signal by assuming pure linearity.
Analysis 4 deeptablesnn - Fundamentally Different Paradigm: LightGBM is tree-based (discrete, rule-based). DeepTablesNN is a neural network (a continuous function approximator based on gradient descent). This is the largest conceptual difference among all alternatives.
- Different Optimization Dynamics: Boosting focuses sequentially on residuals. Gradient descent trains the entire network simultaneously. This allows them to capture entirely different types of patterns.
- Complementary Error Types: The difference in "inductive bias" (the model's inherent assumptions) leads to very low error correlation.
- XGBoost/RF/ExtraTrees: All are tree-based and share the same basic structure, leading to moderate to high correlation.
- Linear/GAM: Considered too low-capacity models (too high bias) to be a full-fledged partner capable of capturing remaining complex signals.
- SVM: Scales poorly and is less practical.
Analysis 5 linear - Lowest Covariance: Argues that the empirical correlation between the predictions of a GBM and a linear model is extremely low (often ρ < 0.2), which provides the greatest reduction in the ensemble's total variance.
- Portfolio Theory: Uses the formula for the Information Ratio (IR) of an ensemble to show that even a model with a lower individual IR (linear) contributes massively to the total risk-adjusted return due to the low correlation (ρ).
- Robust Factor Exposure: A linear model is best suited to capture pure, stable factor premia, which complements LightGBM's hunt for more idiosyncratic alpha signals.
- XGBoost/RF: Too high correlation (provides estimates ρ ≈ 0.7-0.9 and ρ ≈ 0.5-0.6).
- NN: Also a low-bias/high-variance model that can be sensitive to the same non-linear patterns.
- GAM: A step in the right direction, but the correlation is still higher (estimated at ~0.4) than for a linear model.
Analysis 6 extratrees - Maximum Randomization in Bagging: ExtraTrees is a bagging model (like RandomForest), but it introduces an extra level of randomization by choosing split-points completely at random, instead of calculating the optimal ones.
- Even Lower Variance: This extra randomness reduces the model's variance even more than RandomForest, at the cost of a slight increase in bias. This makes it an even more stable partner for the high-variance LightGBM.
- Higher Decorrelation than RF: Because the trees in ExtraTrees are less correlated with each other, they are also less correlated with LightGBM's trees, providing a better diversification effect than a standard RandomForest.
- XGBoost: Too similar (both are boosting).
- RandomForest: Good, but ExtraTrees is better because the extra randomization provides a greater decorrelation effect.

1 Like

I think it also depends on what you are trying to do with the ensemble. You could have three identically (or very similarly) designed models and they would probably be better performers as a group as they would not all learn the same rare faulty patterns because of randomness in the training. At the same time, they would be high correlation.

If they are too different the signals could definitively clash but the similar picks if any, may be more likely to do well. Ensembles are the way to go in terms of the evidence for it, and a very interesting topic

1 Like