Recursive feature elimination

Jrinne · August 19, 2023, 8:55am

All,

“You are the weakest link” —Game show called “Weakest Link”

I will at least try recursive feature elimination with my models. I think everyone could use this with whatever methods they use now (including the optimizer).

Whycliffes was perhaps the first to notice its usefulness of this method when paired with the optimizer and post that observation here: Building a RS with one method and then reversing it

Basically, you start with all of the factors and run rank performance test or whatever with all of the factors. The you remove what looks like the weakest link by some metric and run it again. Or ideally, you try removing each of the factors and remove the one with the least benefit. If the performance improves you keep that factor out. Keep going—removing one factor at a time–until improvement stops.

P123 could automate this now in the optimizer–which with early stopping would make THE AUTOMATED OPTIMIZER A KILLER APPLICATION.

In Python with a random forrest:

from sklearn.feature_selection import RFECV
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

// Splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

// Create a base classifier
rf_model = RandomForestRegressor(n_estimators=100)

// Create the RFE model with cross-validation and fit
selector = RFECV(rf_model, step=1, cv=5)
selector = selector.fit(X_train, y_train)

// Transform X data
X_train_rfecv = selector.transform(X_train)
X_test_rfecv = selector.transform(X_test)

// Fit model using selected features
rf_model.fit(X_train_rfecv, y_train)

// Evaluate the model
y_pred = rf_model.predict(X_test_rfecv)
print(‘MSE:’, mean_squared_error(y_test, y_pred))

// Print support and ranking
print(‘Selected features:’, selector.support_)
print(‘Ranking of features:’, selector.ranking_)

Jim