High Importance coefficients

SZ · September 27, 2025, 7:37pm

Hi , I finally started using the AI Factor tools and on my first couple tests I am obtaining very high feature importance numbers. What would be some common issues that could be causing this? I am using Z-score normalization

pitmaster · September 28, 2025, 5:15am

Is it a linear model ?

trendyist · September 28, 2025, 7:32am

This is normal for lightgbm feature importance as it is calculated differently from other models (at least by default), nothing to be worried about.

SZ · September 28, 2025, 12:37pm

Good to know. the raw number must be being displayed. Thanks!

SZ · September 28, 2025, 12:37pm

lightgbm. I tried extratrees this AM and the numbers were in the format I expected. Looks like lightgbm must be giving the raw amount. A percentage would be more elegant to display

philjoe · September 28, 2025, 3:43pm

if you see one factor that absolutely dominates everything else, it maybe future leak. Thats how I spotted the EPS Std Dev issue.

SZ · September 28, 2025, 5:49pm

I do not think that is happening here, but what are some things to watch for in that regard if one is already leaving a large enough gap?

philjoe · September 28, 2025, 9:45pm

well if you have say 50 factors, and there is a huge gap between first most important and second, i would first check the model without the first factor (ie. the next 49). if out of sample perfromance is drastically worse I’d suspect there is another future leak in the data. No single feature is good enough that it can make or break a well diversified model.

thats how i spotted the EPS std dev - just using that one raw data improved my sharpe ratio by 1 on a strategy with 500 holdings. Not possible unless it has future leak.

SZ · September 28, 2025, 11:34pm

Thanks