Finding BLUE

Jim,

Please don’t spend $4,000 just to run a linear regression on 5,000 data points… there are a lot better ways to spend those $4,000. Excel will work fine. Otherwise, you can use me as a service: I will run any linear regression for you for just $100 each! :wink:

If you’re really going to do a linear regression, transforming the raw data into Z-score will not make any difference at all. (But make sure to include an intercept). The normalization will be done by the regression itself. You’ll get different coefficients, but the goodness of fit and the predictive value will be the same.

If you want to play around to find coefficients yourself, it’s probably better to use Z-score. The coefficients will be a little more on a similar scale.

Regards,
Peter

Jim,

No, since ZScore is not available in ranking systems, what I do is much more complex. I test the factors & functions over 2 different market conditions using single factor ranking systems in Sims using the exposure list to define the periods the Sims buy stocks. The 2 market conditions are first, the 2 recessions, and second, the 2 bull markets that followed the recessions. I look for factors & functions that perform best during each market conditions. Then I try various trim values to look for better performance. Next I combine the factors & functions that work best during the 2 market conditions into 2 ranking systems. I then set up a Sim using the bull market ranking system and use market timing rules to morph into (and back out of) the bear market system using the Rating() function.

Peter,

Your input is always helpful!

STATA would possibly be useful for P123’s massive data at $4,000–perhaps.

I’m using Excel as you recommended and hoping to get $5,000 data points. If that little test shows potential, I will probably want to do a few more tests on the data then.

Thank you!

Jim

Peter,

It is exactly as you say. I originally tried the ranking system for the above sim without the ZScore. I had no luck at all finding coefficients by trial and error.

Thanks for reminding me that I could get rid of Zscore for the linear regression at this point.

Regards,

Jim

Andrew (SUpirate1081),

I mistakenly called ZScore a linear transformation (already edited). Instead, I should have called ZScore a linear function of X. Meaning:

ZScore(X) = a*x + b

Where
a = 1/standard deviation of X and,
b = mean of X/standard deviation of X
This assumes a fixed data set (population) with fixed (I.e., is a constant) mean and standard deviation.

This being the case I think it remains true that: if f(X) = y is linear then Zscore(“f(X)”) = y is linear. Rank[f(X)] = y is decidedly nonlinear.

What you said about the definition of a linear transformation is definitely true and I appreciate the correction.

Regards,

Jim