Hi , I finally started using the AI Factor tools and on my first couple tests I am obtaining very high feature importance numbers. What would be some common issues that could be causing this? I am using Z-score normalization
Is it a linear model ?
This is normal for lightgbm feature importance as it is calculated differently from other models (at least by default), nothing to be worried about.
Good to know. the raw number must be being displayed. Thanks!
lightgbm. I tried extratrees this AM and the numbers were in the format I expected. Looks like lightgbm must be giving the raw amount. A percentage would be more elegant to display
if you see one factor that absolutely dominates everything else, it maybe future leak. Thats how I spotted the EPS Std Dev issue.
I do not think that is happening here, but what are some things to watch for in that regard if one is already leaving a large enough gap?
well if you have say 50 factors, and there is a huge gap between first most important and second, i would first check the model without the first factor (ie. the next 49). if out of sample perfromance is drastically worse Iād suspect there is another future leak in the data. No single feature is good enough that it can make or break a well diversified model.
thats how i spotted the EPS std dev - just using that one raw data improved my sharpe ratio by 1 on a strategy with 500 holdings. Not possible unless it has future leak.
Thanks