Two quick observations:
- Subsampling will speed things up significantly. Using max_samples=0.2 means each tree is trained on just 20% of the data. That can cut runtime by roughly a factor of 5 compared to using the full dataset.
- You can likely reduce max_features even further. For regression trees, max_features=0.3 is a common default (and works well in Random Forests). For classification trees, max_features="sqrt" is standard — and in my experience, "sqrt" also works well for regression in Extra Trees.
You can think of it this way:
Even though "sqrt" evaluates to just 17 features (out of 300), you’ll be training at least 100 trees. Across the full ensemble, you’ll still be using a large portion of the feature set. And if you increase the number of trees to 300 or more, you’ll likely end up using more features overall than you would by using your current method.
Also, keep in mind that max_features only limits the number of features per split, not per tree. So you’re still building trees that explore a wide range of features, just more efficiently. You’re almost certainly using more features more often than with a static manual subset.
So:
- Reducing from 300 to 17 features per split gives nearly a 20× speedup at the split level.
- Combined with subsampling, you could see a huge boost in training speed — possibly 100× faster, depending on your settings.
Literally, you could run this and come back in an hour instead of waiting 5 days!
In practice, you could run a grid search to tune max_samples and max_features. But if that’s too time-consuming, I’d suggest starting with:
- max_features="sqrt"
- A small grid search on max_samples, such as:max_samples=0.2, max_samples=0.5, and max_samples=0.7
This setup should run quickly and still perform just as well.
And when using a large number of estimators (n_estimators=300+), these changes may be essential to finishing your runs in a reasonable time.