Ranking and market timing in combination for stock forecasting models

Yury, now it was my turn to edit my post. :slight_smile:

Just keep going with your thoughts as I am curious about finale.
Thank you.

You really just can’t use non-stationary data in a time series without adjusting. The examples in the text books show high Rs for data that in truth has no correlation whatsoever. A fairly recent Nobel Prize was awarded for techniques to deal with problem.

It has nothing to do with Maximizing R2.

I personally would not try any linear regressions for market timing or times series. I certainly would put no money into it or make any public claims. But that is just me at my level.

I like linear regressions that are similar to our rank performance test. I think this would be cross-sectional data. But I am becoming more aware that any statistical claims are questionable–including R values. This is due to the fact that the data is probably not a normal curve or i.i.d. (thanks Peter and SUpirate1081).

Best,

Jim

Константин, русский?:slight_smile: Я думал тут никог нет.

deleted

Market as whole is a more or less stationary process in comparison to individual stocks performance, and that’s the main difference.
As I remember Markov process deals with non-stationary time-series.

Then you know whether a random walk is stationary? Do you think the stock market is or isn’t a random walk?

You mean after you have corrected for any trend? You have to have done that: by definition of stationary.

As I remember stationary means stable first and second momentums for distribution, mean and variance. It was long time ago when I studied in university :slight_smile:

So your question is what model to imply for MT? Because my regression model is based on cross sectional rather than time-series data.

Yury,

I actually like what you are doing. Personally, I would refresh my memory before going much further with any time series data.

I will be doing some of this myself but not for market timing. But again that is just me at my level. The overall market is almost certainly non-stationary even if it is not a random walk: any trending (at a minimum) must be corrected for: personally I cannot do that.

Good luck.

Jim

Yep, I forgot many things from that stuff. Maybe later I can say anything for sure regarding stochastic processes :slight_smile:
Actually there is no big need to go deep in math. Models that really work are quite simple.

You can correct my memory but my recollection is that a stochastic process is normally distributed by definition. The stock market isn’t.
Steve

I think Jim can make separate theme discussing that things.

I think that you and Jim clearly have too much free time on your hands that could be utilized on other endeavours :slight_smile:
Steve

Steve,

Only you get to spend all of your time talking to Yury?

That is called “weakly stationary”. There is also strong stationarity, which basically means that the underlying distribution of the stochastic process does not change over time.

That is not necessary at all. You can model a stochastic process using any kind of distribution.


In timeseries analysis, you want to model/analyse something that is “stable” over time. The simplest example is when there is a trend in the data. For example, the S&P500 index has an upward trend (in the statistical sense). The easiest transformation is to take first differences. That means you wouldn’t take the index as is, but you’d take the difference between the index on each day and the day before that. Now you’ve got data that is confined to relatively narrow range.

However, because the day-to-day changes get larger (in absolute terms) as time goes on, the range of values you’ve got in the data keeps expanding little by little over time. That’s still not “stable”. So in this case we would prefer to transform the index to percentage changes (daily returns). Now we’ve got something that stays in a relatively narrow range: the daily returns of the S&P500 in 1960 were probably similar in magnitude as the daily returns in each of the decades after it for example.

Of course financial data like the S&P500 index exhibits time-varying volatility. Strictly speaking, this means that even the returns are not weakly stationary (the second moment, the variance, changes over time). You could use GARCH models to handle that. But for simple linear models, this is not so important I think. The first moment, the mean, is much more important.

The key here is that using prices or indices as the dependent variable is usually wrong, you need to work with returns instead (as Yury has been doing).

Jim, my MT model is simple expressing in math. Return = MT function(f1(t), f2(t)…) + e,
where is function - deterministic linear function of factors f, f(t) - factors dependent on time, e- stochastic deviation with stable distribution over time (stationary) with zero mean. Also I assume f(t) is constant over time to avoid overfitting. To get higher correlation to future return you can use not constant factors as me, but variable as Hull (it means he add or leave factor over time through screening procedure). That’s all. Nothing you can do with stochastic component. You can try to model it using stochastic processes but there is no need to do that, accuracy of forecasts won’t be higher in practice.

Be carefull with correlation and Regression. they are based on an assumtion (normal Distribution), Stocks are not normal distributed, they
have fat tails, very ofthen a strategy goes for those fat tails, and would not work if the fat tails would not exist.
At least this is the case with my Systems…

Regards

Andreas

I agree, stochastic component e has not normal distribution. But nothing we can do with fat tails and black swan events. I dont’t model it. We can buy put options if we assume that black swan events risks are not priced efficiently in options, that’s all.

Peter,

Thank you for your clarification. I’m going to assume Yury’s 9 other factors that he speaks about for MT are stationary too: after reading his post (and thinking) the normalization may be helpful in this regard. Not speaking for Yury but for me personally, using time series data would have the risk of showing a false correlation that would never materialize in my trading. I would probably never know why.

One attractive (theoretical for now) feature of the appropriate use of OLS is that it is not really optimized. It’s just the function you get the one time you do the math (or run it in Excel). OOS similar to IS? I intend to find out. But I’m also willing to scrap the whole idea. There are a lot of assumptions starting with should I even be looking for the line of best fit? Is the distribution even linear? Linear even at the extremes for a 5 stock model say? I think you already said it does not work well for 5 stock models. Those outliers that really do not fit the regression line can sure affect your bottom line.

Yury. Looks like you are on the right track. I encourage you to keep going but do pay attention to the details.

Warmest regards,

Jim

1. What we currently have in P123.

We have daily updating point in time data base, ranking tool with a lot of factors, permutation tool for ranking and simulation, rolling test, hedging, books, macro and getseries sections and many other things. So ok.
It allows us to make quite good systems with proper approach.

But, I don’t understand the following (I’m maybe wrong on some issues):
Why does the getseries tool allow to use only universe operations? Why macrodata is not available there?
Why can’t we download in excel info from macro section? Why is macro composite Boolean market timing index not available for hedging purposes in ports?
I think that stuff is easy to do.

2. Must have features:

A) Variable hedging based on MT index. Which in turn constructed on macro data and stock universe specific data. Everyone understands it’s importance.

B) Results presentation. All parameters should be presented year by year from specified starting point (not only calendar years). Performance graphs everywhere (including rank performance) have to show alpha instead of simple return (better to have a switch option between return/alpha)

C) Variable port weights into the book based on set up rules – MT index for example. It is clear too.

D) Pearson and Spearman correlation of separate stock-alpha distribution for specified rank percentiles (for example, the whole range 0-100%, or top 10% only). Using these numbers we can check rank robustness and reliability.

E) Average rank performance (and simulations too) during specified time periods combined (we can do it through permutation tool now, but we don’t see the average results) . It will allow us to quickly optimize systems on assigned history times frames (not full 16 years period as in R2G)

F) Allow short ports and books in R2G, change IS to rolling test, make more strictly disclosure requirements etc, I don’t want to repeat, many things were discussed already several times.

G) Borrowing fees and availability for short ports

3. Desirable features:

a) Individual position variable weights allocated on specified rules, proportionally to stock’s rank for example.

b) Global coverage or at least europe

c) Daily rank recalculation

Regards, Yury.