Regression-Based Models: A Memoir and Guide

I built my first regression-based model 35 years ago, back when we called them ranking systems instead of models. My employer’s main ranking system worked reasonably well on a large equal-weighted universe, but added almost no value relative to the capitalization-weighted S&P 500, which was the benchmark for an important client portfolio. A talented programmer built a multiple regression program to run on the company’s mini-computer, which took almost 48 hours to run a pooled regression covering 20 years, with one six-month observation period per year. One reason for the slowness was our decision to cap-weight the data (simply by dividing the universe into size groups and entering the data once for the smallest stocks and up to 20 times with increasing market cap). We also broke with the firm’s past practice by choosing to model total return rather than price change (those were the days). The result was a model that was value-oriented, in sharp contrast to the firm’s established ranking system, which favored price momentum, earnings momentum, and earnings surprise. The timing was good, because the LBO boom of the 1980s was getting into full swing.

I moved on less than two years after developing my large-cap model; the client terminated the account; and my work was forgotten. But I was left with a strong conviction that multiple regression can be an excellent tool to determine the factors that have predicted relative performance in the past.

Since my first effort, I have worked on multiple models that were designed to provide a disciplined structure to support fundamental investment decisions. I built my second large-cap and first small-cap model, with a colleague, using Systat. Later, I delegated the computer work to more talented colleagues, first to one who wrote his own Fortran code and then to one who is a master of R. But I remained closely involved in setting the research agenda and evaluating its results until my retirement at the end of 2013.

Over the course of my investment career, turnaround time for a regression study went from days, to hours, to minutes. But as the work became easier, competition increased, and potential alpha diminished. The sustained outperformance of mega-cap growth stocks aggravated the problem, because most valuation metrics had little or no predictive power for their performance. At the time of my retirement, our large-cap model had added almost no value in the previous couple of years.

Regression works very well in predicting the past. But it needs to be used carefully and intelligently. So let me try to address some of the issues in the recent thread, started by yuvaltaylor, which prompted me to reminisce.

First, collinearity is a problem for scientists, not for practitioners. Our independent variables are inevitably correlated, which means we can’t attribute statistical significance to any one of them. And anyway, stock models struggle to explain even 5% of the variance in returns. So the only concern for practitioners is whether correlations among independent variables are stable enough to produce useful forecasts. In my experience they are, but I have worked only with predictive factors that have an established investment rationale. Data miners might have a different experience.

Data quality matters a lot. A single hugely negative book value figure once blew up one of my models. Problems can be largely avoided if predictors are ranked or normalized (forced into a normal distribution). My experience is that the actual data distribution has value, but that means working to control outliers. I recommend winsorizing to deal with any absurd values, followed by several iterations of standardizing and winsorizing. Excess total returns should be logged and possibly winsorized. Regression is sensitive to leverage points.

Use a forecast horizon that makes sense for your predictors. I find six months reasonable for most fundamental variables. Even estimate revisions, whose effect seems to dissipate quickly, tend to have a later second life, probably because of serial correlation. The effectiveness of a deep-value factor like price-to-sales can increase over several years.

Don’t worry about having overlapping observation periods, since you’re not trying to prove statistical significance. The more starting points you can test, the better. I recommend running regressions on each test period and averaging the coefficients rather than pooling all the observations. Pooling can reduce the impact of outliers, but it can distort results if the number of observations per period varies. And I feel it’s important to see how the coefficients change over time.

If your results are too good, something is wrong. I once found that a surprisingly effective long-term price change factor was subtly biased in favor of stocks that would have future splits. More commonly, risk factors often look compelling in bull markets.

Aside from the difficulty of doing it right, the biggest drawback I see in multiple regression is that it explains all the variance in returns, but I’m only interested in one or two of the tails of the distribution. In building ranking systems on this platform, I have therefore used rolling backtests of top decile performance to optimize the weights of individual factors and composite nodes. Based on this experience, as well as my past experience with multiple regression, I am convinced that equal weighting factors is suboptimal.

Finally a pitch for two recent feature requests. My rolling backtests would be much easier to do if I could specify the equal-weighted selection universe as my benchmark.
See: https://www.portfolio123.com/mvnforum/viewthread?thread=11356

And testing a large-cap ranking system in a portfolio would work much better if we could use the active (benchmark-relative) weight to set the position size. After all, active weights determine benchmark-relative performance.
See: https://www.portfolio123.com/mvnforum/viewthread?thread=11355

T,

WOW!!! What a wealth of practical experience!!!

Thank you!

What do you do about linearity? Nothing I do is linear.

BTW, R can bootstrap the regressions easily. And a lot cheaper than SPSS or other solutions. Example below (today is first day I have bootstrapped a regression). Still learning on this. FWIW. I guess this only solves the heteroscedasticity and non-normality problems?

It is that linearity problem that is always there for me.

And again, thanks.

-Jim


Jrinne,
I’m not sure what kind of nonlinearity you’re thinking of. One example that comes to mind for me is valuation metrics, which at extreme levels tend to indicate distress rather than good value. But I try to deal with that by having offsetting factors, such as profitability, quality, and revisions, in my model. It’s also important to use well-behaved variables. Percentage changes need to be bounded, and valuation factors should always have the price-related component in the denominator.

T,

I guess I mean any rank performance test that I do–assuming that it is even reasonable to do this on ordinal numbers. I do think that if you force the ranks onto a Z-Scale (using normsinv() in excel) that it is pretty much an interval variable.

But price-to-sales EBITDA/EV etc are not linear. Maybe close enough for some, I guess. Anyway, I am only commenting on my own work. I do think it does not always have to be perfectly linear.

This may be me being OCD. Generally, all great points. I definitely get your point about collinearity and this confirms what I have read (and suspected to be the case for P123 data). I appreciate the confirmation on this. Just one example of how great your post is!!!

Thanks,

Jim

Jim,
I’ve never seen a reason to worry about independent variables not being evenly distributed, if that’s what you mean by nonlinearity. On the contrary, as long as outliers are controlled, I’ve gotten better results by standardizing and winsorizing factors than by normalizing them. I believe that this is because the data is left closer to its original (uneven) distribution than it would be in a normal distribution. As a result, the best and worst scores on factors stand out more. This is irrelevant if you’re ranking each factor, as I currently do in P123, but important if you’re weighting the factors and adding up the results to get a composite score that you rank.

So awesome!!!

Thank you!

-Jim

Dear redshield:

I’m grateful for this. I’ve been using an iterative approach to optimization on partial universes and time periods and then averaging the results for my final model. The iterative approach to optimization is extremely time-consuming. It involves taking a weighted ranking system with forty weights (2.5% each) spread over 100 factors (obviously the large majority would get 0 weight), randomly increasing the weight of one factor (from 0 to 2.5 or from 2.5 to 5 etc) and decreasing the weight of another, then seeing if performance is improved; if so, using the new weights going forward and if not reverting to the old weights. I have found this to be a very effective procedure in terms of getting good results (I made 45% in 2016, 58% in 2017, and I’m up 32% YTD), but it’s a huge job. I don’t honestly know if I want to delve into multivariable or discriminant analysis or whether I should simply stick with my method; I can imagine that coordinating P123 backtests with multivariable analysis (using R, XLSTAT, RUBY, or some other program) would be just as big a job, if not bigger, than the way I’m doing it now, and I’m not sure of the advantages it would offer. I’d be grateful for your feedback on this question.

As for equal weights being suboptimal, I’m with you there. Imagine a ranking system with three factors: 1-year accruals, 3-year accruals, and price-to-sales. Only an idiot would weight them equally. That’s an extreme example, of course, but there are so many factor interrelationships that equal weighting doesn’t take into consideration. And, of course, equal weighting takes no account of how many factors of each type (value, growth, quality, size, sentiment, technical) you have.

Thanks,

  • Yuval

Edited for resdundancy. Redshield and Yuval pretty much said it.

Yuval,

Thanks for sharing this information about your model-building process. Your approach sounds similar to what I did in building my latest model, but you have been more systematic and more careful in looking for out-of-sample validation. I started from a point of frustration about the recent poor performance of factors that I had long used, particularly among large-cap stocks. So I set out to identify what factors, if any, had worked in the past five years, limiting myself to factors that I considered to have a defensible investment rationale. The result of the exercise was to drop some composite factors, including price momentum and reversal, to change some individual variables, and to adjust the weights of the remaining composites. The changes generally make sense as an adaptation to a market dominated by fast-growing digital businesses with a marginal cost of sales that is near zero. But I was very pleasantly surprised to find that the new model also performed well in earlier periods and reasonably well on the Russell 2000 (though it lags my dedicated small-cap model there).

I would have greater confidence in my model if I had used multiple regression to develop it, but I have no way of knowing how different the result would be. My original plan for retirement was to develop small- and large-cap models based on statistical research, and I set out to learn R. But I eventually decided that I didn’t want to be a full-time data jockey and looked for something that would provide more immediate gratification. In Portfolio 123, I can change a factor on the fly and see quick results in a rolling backtest. My main frustration with the platform, aside from the two feature requests mentioned above, is the limited estimate data available (compared to the database I used to use with individual analyst estimates on multiple variables) and the lack of access to the full Compustat/Capital IQ fundamental database. But I’m getting results with what I have.

I’m not familiar with XLSTAT and RUBY, but I would definitely recommend R. It takes time to learn, but RStudio provides an interface that may be helpful. If you invest the upfront time, you will find a wealth of intriguing applications in the main program and in add-ons developed by the community (which is engaged, very smart, and intolerant of fools). In addition to R’s OLS functions, we used “Random Forest” for idea generation. And we experimented with logistic regression in a not very successful attempt to do a better job of finding winners (instead of explaining the full set of returns, you code the top x% as 1 and the rest of universe as 0). There’s even an optimization add-on, so you could develop a minimum variance portfolio of favorably ranked stocks (a pet idea of mine).

Here’s the catch. In addition to learning R, you will have to download and manage a ton of data. I would want quarterly (timed at mid-quarter to reflect the previous quarter’s earnings) or monthly files. Each file would include symbol, sector, industry, possibly subindustry, all the raw predictors you want to test, and subsequent total returns for, perhaps, the next one, three, six, and 12 months. You would then have to write an R routine that would grab the raw data files, transform the data (winsorizing and standardizing at whatever level of aggregation you think appropriate), run the regressions, and output the results to a data file. If you’re satisfied with the results, you’ll move the coefficients from the individual regressions to a spreadsheet, average them over time, and use the resulting weights in your model. (It’s not clear to me how one would make this run within P123 on an ongoing basis, but I haven’t devoted any time to the matter.)

Whether you proceed with regression studies or not, you should be able to enhance the credibility of your results by showing performance against an appropriate (equal-weighted) benchmark, with data on drawdowns and other risk metrics. You may also want to track the performance of your portfolio against the relevant decile(s) of your ranking system.

I hope this is helpful, redshield

Very much agree. RStudio has been helpful for me.

BUT I am very much a beginner with R and am IN NO WAY a programmer of any sort.

I have recently been surprised by R Commander—which probably is a reflection of being both a beginner and not-a-programmer. Every resource I have read (except one) talks badly about R Commander. I think this is because it takes about 3 hours to learn it and then you are done (limited by R Commander’s abilities). And yet you will have learned nothing about programming in R. It provides a GUI for R.

But the good news is you will be able to do an awful lot after 3 hours. And the R console remains open to use separately at any time.

I am now reading “Discovering Statistics Using R” by Andy Field (not done so my opinion could change quickly). He does use R Commander but also provides extensive scripts for R. And it is a good introduction/refresher for statistics. I have not finished this book but I think it can probably get people up-and-running fast and start one on the road to programing in R. And give you a start on collecting a library of scripts to boot.

And R Commander is not completely without power—for example, you can bootstrap regressions in R Commander. This is relatively new to SPSS and comes at an extra cost. And there are a good number of tests for the assumptions of linear regressions. Adequate graphs for most purposes including 3D graphing abilities.

Sadly, when the OCD part of me checks for linearity (RESET test) I find that one of my linear regressions is probably not linear with a p-value = 3.02 X 10^-10. And honestly, I do not do much better on the Durbin-Watson to test for autocorrelation (p-value = 0.000005385). Just one regression but the one I was looking at today (for serious investing reason). What assumption I violate can vary. Needless to say, you probably should not be changing your investment patterns based on this regression alone—or any of my regressions for that matter.

Speaking just for myself, it is just a question of how badly I have violated the assumptions, how much to worry about it and what I would do instead–or could do to confirm what I have done in a reliable and valid manner. I have resolved the problem that I never meet the assumptions by separating the processes of finding stocks and doing statistics. Maybe do a linear regression to define the model (no statistics reported). And use a separate, simple, statistic to see how the model performs (like a t-test comparing the model to a benchmark and maybe a Spearman Rank Correlation with no p-value to show a trend). My default assumption is that other people have not looked at their assumptions when they report statistics on their linear regressions either. And not being their statistics teacher, I no longer ask.

Many researchers have gotten away from reporting statistics based on linear regressions… The statistic they report is often the t-statistic of the returns for stocks in upper quintile vs. the bottom quintile of their model. Far fewer assumptions.

I am working on getting some data for a multiple regression and trying the model selection based on the “Stepwise Model Selection using Akaike Information Criterion (or BIC).” Will that stop me from overfitting as advertised? Doubtful in my case. Just a beginners perspective.

Fortunately, I have an experienced perspective on what is actually important for finding a winning model (see above). And I do take this to heart.

Redshield, thank you again for sharing your experience.

-Jim

Jim,

If you haven’t already done so, you may want to sign up for the R help list:

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help

-redshield

Also R-bloggers: https://www.r-bloggers.com/

A lot there including group—e.g., finance. And archives, so periodically browsing the archives an option if you do not want to be on the mailing list for all of it.

Much appreciated.

-Jim

Redshield,

Did you or your associates use the AIC (Akaike Information Criterion) to help with model selection?

Thanks.

-Jim

Jim,

Sorry, but I have absolutely no knowledge of the AIC.

All,

I admit I just like doing this. But I also think it could be useful—if I ever really understood it and had a large enough sample.

I just did a small repeat k-fold cross validation study (using R and the caret package) on something. So, this is PREDICTED R-SQUARED. As well as Root Means Squared Error (RMSE) and Mean absolute error (MAE).

Looks like everything got worse when I added another predictor. (Sorry about the poor column formatting below. Adding spaces or tabs does not seem to help and I did not want to do an image)

PREDICTORS RMSE RSQUARED MAE
4, 0.06498, 0.0007255, 0.04193
5, 0.06499, 0.0006701, 0.04193 (larger before rounding)

So 5 predictors might be overfitting. But I am just playing (and learning) at this point: with a small sample to boot.

A question I would have is: Edit. I was reading “Applied Predictive Modeling” which answered my question for now.

Anyway, any discussion welcome.

Thanks.

-Jim

and

So, you cannot do this with P123 data due to the download limitations.

But OLS is still lighting fast in 2018 with a modest computer (and modest amount of data), it seems. Ridge regression, Lasso regression and the optimization of the parameters using cross validation seems to be okay. But I ran a (boosted) “Random Forest” with cross validation in R that took a while (plyr and C50 packages): with decent cross validation results, BTW. Some other machine-learning-methods took longer and I shut them down before completion (e.g., MARS or earth). I think a K-nearest-neighbors algorithm might not work well on any machine available to a retail investor. And I have not even begun to look at outputting predictions in a timely manner–which can be more of an issue for some methods than others.

So I will, probably, get a new computer in the not too distant future. For R, parallel processing might work best on an iMac Pro I think. I am not an expert on this as my 2-core machine leaves just one core for R at the moment (actually it would be quite a stretch to say that I am an expert on any of this). Windows machines cannot do some of the parallel processing in R (at least historically). And SPSS could run on a Mac if I ever wanted to use it (probably not). Maybe I can run anything (including Windows) with Bootcamp. Or can I? Can I do this and still make use of the parallel processors, cached memory etc?

Does everyone use R? If so, I will be looking at an iMac, I guess. Maybe I just want to get a Windows Machine to run STATA or another program?

Any ideas welcome. Thanks.

-Jim

I run parallel kernels in Mathematica on Windows 10. It eats up a lot of system resources but the kernel setup/teardown process is pretty easy.