Ranking and market timing in combination for stock forecasting models

You really just canā€™t use non-stationary data in a time series without adjusting. The examples in the text books show high Rs for data that in truth has no correlation whatsoever. A fairly recent Nobel Prize was awarded for techniques to deal with problem.

It has nothing to do with Maximizing R2.

I personally would not try any linear regressions for market timing or times series. I certainly would put no money into it or make any public claims. But that is just me at my level.

I like linear regressions that are similar to our rank performance test. I think this would be cross-sectional data. But I am becoming more aware that any statistical claims are questionableā€“including R values. This is due to the fact that the data is probably not a normal curve or i.i.d. (thanks Peter and SUpirate1081).

Best,

Jim

ŠšŠ¾Š½ŃŃ‚Š°Š½Ń‚ŠøŠ½, руссŠŗŠøŠ¹?:slight_smile: ŠÆ Š“уŠ¼Š°Š» тут Š½ŠøŠŗŠ¾Š³ Š½ŠµŃ‚.

deleted

Market as whole is a more or less stationary process in comparison to individual stocks performance, and thatā€™s the main difference.
As I remember Markov process deals with non-stationary time-series.

Then you know whether a random walk is stationary? Do you think the stock market is or isnā€™t a random walk?

You mean after you have corrected for any trend? You have to have done that: by definition of stationary.

As I remember stationary means stable first and second momentums for distribution, mean and variance. It was long time ago when I studied in university :slight_smile:

So your question is what model to imply for MT? Because my regression model is based on cross sectional rather than time-series data.

Yury,

I actually like what you are doing. Personally, I would refresh my memory before going much further with any time series data.

I will be doing some of this myself but not for market timing. But again that is just me at my level. The overall market is almost certainly non-stationary even if it is not a random walk: any trending (at a minimum) must be corrected for: personally I cannot do that.

Good luck.

Jim

Yep, I forgot many things from that stuff. Maybe later I can say anything for sure regarding stochastic processes :slight_smile:
Actually there is no big need to go deep in math. Models that really work are quite simple.

You can correct my memory but my recollection is that a stochastic process is normally distributed by definition. The stock market isnā€™t.
Steve

I think Jim can make separate theme discussing that things.

I think that you and Jim clearly have too much free time on your hands that could be utilized on other endeavours :slight_smile:
Steve

Steve,

Only you get to spend all of your time talking to Yury?

That is called ā€œweakly stationaryā€. There is also strong stationarity, which basically means that the underlying distribution of the stochastic process does not change over time.

That is not necessary at all. You can model a stochastic process using any kind of distribution.


In timeseries analysis, you want to model/analyse something that is ā€œstableā€ over time. The simplest example is when there is a trend in the data. For example, the S&P500 index has an upward trend (in the statistical sense). The easiest transformation is to take first differences. That means you wouldnā€™t take the index as is, but youā€™d take the difference between the index on each day and the day before that. Now youā€™ve got data that is confined to relatively narrow range.

However, because the day-to-day changes get larger (in absolute terms) as time goes on, the range of values youā€™ve got in the data keeps expanding little by little over time. Thatā€™s still not ā€œstableā€. So in this case we would prefer to transform the index to percentage changes (daily returns). Now weā€™ve got something that stays in a relatively narrow range: the daily returns of the S&P500 in 1960 were probably similar in magnitude as the daily returns in each of the decades after it for example.

Of course financial data like the S&P500 index exhibits time-varying volatility. Strictly speaking, this means that even the returns are not weakly stationary (the second moment, the variance, changes over time). You could use GARCH models to handle that. But for simple linear models, this is not so important I think. The first moment, the mean, is much more important.

The key here is that using prices or indices as the dependent variable is usually wrong, you need to work with returns instead (as Yury has been doing).

Jim, my MT model is simple expressing in math. Return = MT function(f1(t), f2(t)ā€¦) + e,
where is function - deterministic linear function of factors f, f(t) - factors dependent on time, e- stochastic deviation with stable distribution over time (stationary) with zero mean. Also I assume f(t) is constant over time to avoid overfitting. To get higher correlation to future return you can use not constant factors as me, but variable as Hull (it means he add or leave factor over time through screening procedure). Thatā€™s all. Nothing you can do with stochastic component. You can try to model it using stochastic processes but there is no need to do that, accuracy of forecasts wonā€™t be higher in practice.

Be carefull with correlation and Regression. they are based on an assumtion (normal Distribution), Stocks are not normal distributed, they
have fat tails, very ofthen a strategy goes for those fat tails, and would not work if the fat tails would not exist.
At least this is the case with my Systemsā€¦

Regards

Andreas

I agree, stochastic component e has not normal distribution. But nothing we can do with fat tails and black swan events. I dontā€™t model it. We can buy put options if we assume that black swan events risks are not priced efficiently in options, thatā€™s all.

Peter,

Thank you for your clarification. Iā€™m going to assume Yuryā€™s 9 other factors that he speaks about for MT are stationary too: after reading his post (and thinking) the normalization may be helpful in this regard. Not speaking for Yury but for me personally, using time series data would have the risk of showing a false correlation that would never materialize in my trading. I would probably never know why.

One attractive (theoretical for now) feature of the appropriate use of OLS is that it is not really optimized. Itā€™s just the function you get the one time you do the math (or run it in Excel). OOS similar to IS? I intend to find out. But Iā€™m also willing to scrap the whole idea. There are a lot of assumptions starting with should I even be looking for the line of best fit? Is the distribution even linear? Linear even at the extremes for a 5 stock model say? I think you already said it does not work well for 5 stock models. Those outliers that really do not fit the regression line can sure affect your bottom line.

Yury. Looks like you are on the right track. I encourage you to keep going but do pay attention to the details.

Warmest regards,

Jim

1. What we currently have in P123.

We have daily updating point in time data base, ranking tool with a lot of factors, permutation tool for ranking and simulation, rolling test, hedging, books, macro and getseries sections and many other things. So ok.
It allows us to make quite good systems with proper approach.

But, I donā€™t understand the following (Iā€™m maybe wrong on some issues):
Why does the getseries tool allow to use only universe operations? Why macrodata is not available there?
Why canā€™t we download in excel info from macro section? Why is macro composite Boolean market timing index not available for hedging purposes in ports?
I think that stuff is easy to do.

2. Must have features:

A) Variable hedging based on MT index. Which in turn constructed on macro data and stock universe specific data. Everyone understands itā€™s importance.

B) Results presentation. All parameters should be presented year by year from specified starting point (not only calendar years). Performance graphs everywhere (including rank performance) have to show alpha instead of simple return (better to have a switch option between return/alpha)

C) Variable port weights into the book based on set up rules ā€“ MT index for example. It is clear too.

D) Pearson and Spearman correlation of separate stock-alpha distribution for specified rank percentiles (for example, the whole range 0-100%, or top 10% only). Using these numbers we can check rank robustness and reliability.

E) Average rank performance (and simulations too) during specified time periods combined (we can do it through permutation tool now, but we donā€™t see the average results) . It will allow us to quickly optimize systems on assigned history times frames (not full 16 years period as in R2G)

F) Allow short ports and books in R2G, change IS to rolling test, make more strictly disclosure requirements etc, I donā€™t want to repeat, many things were discussed already several times.

G) Borrowing fees and availability for short ports

3. Desirable features:

a) Individual position variable weights allocated on specified rules, proportionally to stockā€™s rank for example.

b) Global coverage or at least europe

c) Daily rank recalculation

Regards, Yury.

Concentrated ranking and combined forecast model.

In ranking histogram we see two dimensional space rank-return, or lets say ranked factor-alpha (correct view).

As I mentioned in previous posts the larger time frame for the ranking system simulation the smoother the histogram.
In other words the greater correlation between factor range (rank percentile) and realized alpha (within the time frame when stocks lie within that percentile).
We donā€™t see that correlation because we can divide our 4000 stock universe only to 200 percentiles. That means every percentiles consists 20 stocks that placed somewhere on rank-alpha graph within it. But we can see larger scale correlation across different percentiles (percentiles deviation on graph is low within quite small ranges, lower number of outliers in other words) and high overall ranking slope.

The smaller simulation period the less number of stocks were used for distribution, the higher stochastic not compensated component and the lower the overall correlation. It is totally clear then you use 1 years or 6 month instead 16 years for test. But due to cyclical factors nature (and maybe not-monotonic) very large time frame is bad too. It reduces ranking performance in tests. It is clear too. For example if we imagine 3 dimensional space again with x axis as stock specific factor (explains alpha waves) and y as market timing index factor (explains beta waves) and z as total risk adjusted return (inludes beta and alpha) we ll see that for different MT range value correspond different factor distribution (which we see on our ranking histogram). At some time it gets high correlation at some zero and even negative (for example volatility factor during bull and bear market). Combined MT and stock specific model forecasts returns on each stock.

Using concentrated rank especially without appropriate market timing model is very dangerous. It becomes even more dangerous when along with such a ranking you use very limited stock holding number (when stochastic component not compensated enough). Especially on shorter time frame periods (when correlation is low no matter which factor you use).
Each layer imposes on each other (negative beta, not appropriate factor for specific market conditions - negative alpha, plus negative stochastic realization), and you can huge drawdown even if the market drops only for 5%.

Also about factors and smart beta. Recent smart beta strategies popularity growth gives us very attractive opportunity. Mentioned alpha waves becomes larger and longer (creates alpha momentum by itself), small money investors can jump and dismount very fast achieving their first tier alpha in comparison to smart beta ETFs.

I wrote that daily rebalancing is a desirable feature. But floating rebalance date for R2G is necessary thing. Because for example I have attractive valueline based port (just in the best R2G traditions of 50%+ returns :), but it canā€™t rebalance on monday because of valueline data delay.



smart-beta-defining-the-opportunity-and-solutions.pdf (881 KB)