I have to say that I also have trouble undertanding the reasoning especially your view on statistics.
Regarding the ETF, OMFL is supposed to be a risk-on ETF, the chart that is attached is a 2 year performance chart togther with risk-off to GLD using a market timing rule based on AAA-BBB corporate bond spreads. The paper is already attached in my message to Jim earlier.
By the way, are you trying to sell your Quanstrike Designer Model for $1000/month to someone on P123?
If you are interested in pairs trading, there is a website https://pairtradefinder.com which allows you to backtest different strategies from data up to 10 years. If you are good at python, of course you can develop your own trading algo at Quantopian. Right now, I am not running any stat arb strategies like the one used at RT.
So looking at the paper some portfolios were hedged. I do not have a number for this.
So a Sharpe Ratio will use the standard deviation (in the denominator) which could, theoretically, be close to zero with a very good hedge.
A zero standard deviation would become the standard deviation of the benchmark with the IR. This excludes the effect of the risk free rate which is small now.
I think this explains why the IR is negative while the Sharpe Ratio is positive.
Yuval writes “From what I gleaned, the correlations were both so close to zero that whether one had a negative or positive correlation was pretty irrelevant. I have an extremely low opinion of the Sharpe ratio for the same reasons as Andreas does. The lesson I gleaned from the paper is never to expect correlation between in-sample and out-of-sample performance over very short periods. If I remember the 2016 paper correctly, it looked at in-sample and out-of-sample performance over periods that were less than two years long (maybe even less than one). I also think that the regression lines in the illustrations are comical at best. One shouldn’t try to draw linear regression lines using OLS methods through data that looks like a hornet’s nest. This is all just my opinion, of course, and I may be misremembering the paper, and maybe someone will convince me that I’m mistaken here.”
I will consider this also and I appreciate the post.
But focusing on the discrepancy of the Sharpe Ratio and IR,
I use IR a lot, should I move to Sharpe ratio (as flawed as both may be)?
I like excess returns for machine learning as it decreases the noise from random market fluctuations. Not sure I can give it up.
Is the Sharpe Ratio most useful when using leverage? After all, returns and IR are negatively correlated in the paper. Does leverage make up for that?
MAYBE ON THE LAST. After all you do not really care about comparisons to some benchmark when considering your retirement. You just want to know how much money is in your account.
Could this be a partial explanation for some of what seems counterintuitive in RTs success. They are making some bad bets in their hedges but the reduced volatility and leverage seem to work out in the end.
This is a little above my quant skills at this time and I want to apologize ahead of time for any lame ideas. I use as my only excuse that I am just beginning to think doubt this in depth and it is still early in the morning (for a weekend). That and I only aspire to be a quant.
Thanks.
Edit:
I am slow in every sense of the word its seems. But I have not encounter anything I cannot do with enough time and motivation yet. Plus, there are always scripts to copy and paste.
As you rightly points out, it is not about getting it right 100% of the time. Medallion only profits on barely more than 50% of its trades and the gains on each trade were never huge. Medallion enjoyed a slight advantage in its collection of thousands of simultaneous trades, one that was large and consistent enough and together with the high leverage it employs makes an enormous return. (annualized return of 66% before fees since it inception in 1988).
Fama and French portfolio construction is not based on statistics that assume the normal distribution of returns.
They use ranking on factors (like us with portfolio123!) which has no assumption of the normal distribution of returns.
And their factors work since 1926, so that is a good start.
(Where I disagree with them is the explanation of the outperformance, bc. they still say the market is efficient and if those strategies outperform there must (by definition and that’s where dogma starts!) be and underlying risk and that is the reason why they outperform (no there is not, its just emotionally very hard to implement them).
And I am not against statistics, but they need to be compatible with the data. The data set is how it is and the theory adapt to it, not the other way around.
If you assume the data has no fat tail but it has fat tails that is a first order mistake and that’s why LTCM blew up (and they had all the statistics
beforhand that this would never happen!)
An Example:
Shorting the VXX or options directly gives you a great sharpe ratio > 3.5 but one day it will blow up and the sharpe ratio (and any other statistic that does assume returns have no fat tails) does not tell you that bc. it does not take the fat tails into consideration.
Since the modern portfolio theory this discussion goes on and the CAPM is a beautiful model, beautiful math, statistics can be used to describe it. But it describes a world that does not exist (bc. we have fat tails!).
Good start is Taleb and his whole work on fat tails as an option trader and writer…
So, to sum up: if the statistical method is compatible with the data, that is 100% fine by me.
And if you do use statistical methods that assume a normal distribution of stock prices, fine, that’s just not my cup of tee.
I will not sell the model for a 1000, it would need to be much, much more
So I disagree. They used regression with indexed (or dummy) variables which is different.
But please be aware that I would be doing exactly as you suggest with any statistical methods that P123 makes available: “use ranking of factors” as the predictors.
This is the easiest and most natural method for P123 and its members.
In addition, tree methods and neural nets are particularly well suited for this. Tree methods, for example, care nothing about the form of that predictor (are invariant) as long as the predictor is monotonically increasing. EVEN BETTER THAN FAMA and FRENCH’s method.
This means you could use rank, as you suggest, AND YOU WOULD NOT EVEN HAVE TO. This was not available to Fama and French. Maybe we could consider moving forward to their methods (which you clearly accept) and maybe even beyond that here at P123.
So perhaps you might allow in your posts that you are not against all statistics, and in fact, you use and embrace statistics that are done properly.
I am sure no professional investor will assume that the financial market has no fat tail. But the risk should be address through diversification of your portfolio and strategies with low standard deviation(like Stat Arb) and not through the disregard of statistical evalution metrics such as Sharpe/Sortino etc and running strategies with large drawdowns (DD). Fat tail distributions (left or right) has a much greater chance of values above 4 or 5 standard deviations. The Medallion fund’s 31-year return of 63.3% before fees was a full 13 standard deviations above the return on the market. That’s a lot of standard deviations, fat tails or no fat tails.
Medallion is also able to maintain a Sharpe ratio as high as 7-10 for the fund. And in spite of the much larger annualualized return numbers, its risk, as measured by the standard deviation of Medallion’s returns, has not been that much greater than the S&P’s. Medallion actually gained 98.2 percent gain in 2008, the year of the global financial crisis and the S&P 500 lost 38.5 percent.
I don’t think it is a good strategy to choose when to believe in stats/quant and believe it only when the data is convenient to do so.
Let me preface my remark here by saying that a) I am very far from an expert in statistics and b) what I write below represents my own personal opinion shaped by my reading and not Portfolio123’s position.
It has been well-known for over fifty years that OLS (ordinary least squares) methods should not be used on heteroscedactic data with fat tails, and that stock-market returns are heteroscedactic with fat tails. (Homoscedasticity means that the variance of the data is evenly distributed over the independent variables and heteroscedasticity means that it isn’t.) The warnings have come from statisticians, not from people who write about finance. Many academics who write about finance have continued to use OLS regression and standard deviation measures. The Sharpe ratio, the Sortino ratio, the information ratio, alpha, and beta are all based on OLS methods. This does not mean we shouldn’t use statistics. It means that 95% of the finance profession is using statistics incorrectly. This is one thing that Andreas is getting at. A number of academics and non-academics in finance are challenging the 95%, including C. Thomas Howard, Frank J. Fabozzi, Sergio M. Focardi, Marcos Lopez de Prado, etc. It’s worth reading this interview with Fabozzi: https://blogs.cfainstitute.org/investor/2019/06/03/fabozzi-finance-must-modernize-or-face-irrelevancy/ : “Economists cannot blindly adopt statistical techniques that were designed for experimental biology. As López de Prado and I explained, economics does not allow for experiments based on large, independently drawn samples of data from a stationary system.” If the CFA–an institution that teaches standard econometric models–is coming down on the side of the behaviorists, there’s a good reason for it. Econometrics is collapsing.
C. Thomas Howard has a good explanation for how portfolio risk became associated with the standard deviation of returns in his book Behavioral Portfolio Management (which, unfortunately, I do not have at hand at the moment or else I would quote from it). He, and many other behavioral economists, argue against this association, and maintain that the common-sense definition of financial risk–the probability of a permanent loss of capital–should be the one that guides our financial decisions. Again, this is what Andreas is arguing as well, and it’s what Warren Buffett has been saying for decades.
The probability of a permanent loss of capital looks very different when using leverage than it does when not using leverage, and not only because of margin calls. I personally have not looked at the best ways to measure these risks. But the starting point should probably not be the standard deviation of returns.
The statistical tools that seem to me to be most commonly used measure deviation, correlation, regression, and confidence, which are all interrelated. I personally think that measuring these things is extremely valuable. Many investors and academics, though, build successful investment strategies without using these tools. Certainly one can calculate probabilities without using any of them.
Statisticians have been developing alternatives to OLS methods since the 1930s, but in the field of finance they have not been adopted. OLS methods have the advantage of being very easy to compute, but there’s no reason at this point to favor an inefficient and error-prone method that is easy to compute over a better one that is hard to compute. Theil-Sen estimation and LAD regression should take the place of OLS regression (both of these methods are over 50 years old); when calculating correlations, Kendall’s tau should take the place of Pearson’s r (this is a method that is about 80 years old, I think); when calculating deviation, average absolute deviation or median absolute deviation should take the place of standard deviation. (I have not yet looked into non-OLS methods for calculating confidence, but I’m certain that not only do they exist, but that they are old.) OLS methods are extremely susceptible to outliers, and have a number of other problems that these methods were developed to counteract.
“So perhaps you might allow in your posts that you are not against all statistics, and in fact, you use and embrace statistics that are done properly.”
I do and I have stated this:
“So, to sum up: if the statistical method is compatible with the data, that is 100% fine by me.”
This is the portfolio construction of Fama and French and I only refer to the portfolio construction, not the evaluation of it with sharpe or other
stats that assume the normal distribution of returns.
"The Fama/French factors are constructed using the 6 value-weight portfolios formed on size and book-to-market. (See the description of the 6 size/book-to-market portfolios.)
SMB (Small Minus Big) is the average return on the three small portfolios minus the average return on the three big portfolios,
SMB =
1/3 (Small Value + Small Neutral + Small Growth)
1/3 (Big Value + Big Neutral + Big Growth).
HML (High Minus Low) is the average return on the two value portfolios minus the average return on the two growth portfolios,
HML =
1/2 (Small Value + Big Value)
1/2 (Small Growth + Big Growth).
Rm-Rf, the excess return on the market, value-weight return of all CRSP firms incorporated in the US and listed on the NYSE, AMEX, or NASDAQ that have a CRSP share code of 10 or 11 at the beginning of month t, good shares and price data at the beginning of t, and good return data for t minus the one-month Treasury bill rate (from Ibbotson Associates)."
So from what I can see, the portfolio construction is 10th grade math (or even lower), and I like that, bc. there is no
assumption that the data has no fat tails and volatility is stationary.
“Econometrics is collapsing.” [that uses the wrong assumptions and they are used bc. they produce beautifull math that can be marketed
in the Journal of Finance].
I could not have said it better!
So once again: I am all in about statistical methods that are compatible to the data, e.g. what Yuval pointed out methods that fit to heteroscedasticity data.
My takeaway is, that if P123 wants to go more into the quant direction, p123 could implement those methods that are compatible to heteroscedasticity data.
But we should not make the same mistake over and over (like the finance industry and academics do since 70 Years) and use methods that simply have a wrong assumption (e.g. using all OLS Statistics!).
Not so coincidentally we are having a 50 year old debate here on P123. Who cares what they said 50 years ago? Why are we even discussing OLS if you do not like to use it?
Yuval, I understand the quote below and your posts elsewhere make clear you understand that machine learning can be useful. I congratulate you for this. I ask again, why are you focusing on OLS here on the P123 forum? Why aren’t we spending more time discussing modern techniques you have said you like such as bootstrapping. Why aren’t you informing members of the new techniques?
More importantly, P123 should be attempting to make these techniques available in a usable manner. Maybe you are making an effort to provide those methods to us–along with Marco. Marco is clearly looking into modern ideas and the best way to implement them (if reasonable financially). If you are helping with this then THANK YOU!!!
A lot has happened in 50 years. Am I right? Has any member not noticed? If your were in a coma (or something), I do not think I will be able to get you fully up to date in this post. Below is a sample of some of the technical progress that has been made. More generally, did you notice the self driving cars when you came out of your coma? Have you talked to Siri? I hate to inform you but she is not a real person. However, she can beat you in chess and can answer more factual questions than you can.
So here are some technical reasons this the modern era is possible (for those coming out of their coma or who have not opened a textbook recently):
Many methods care absolutely nothing about the distribution of the predictor: in statistical-learning parlance they are “invariant.” That means that as long as the distribution is monotonic we can reshape (transform) the distribution to look like a bologna sandwich and the program will give the same answer. Whether the distribution of the predictor is normal or not means nothing in this century.
As far as the “label” I usually use MAE (mean absolute error), Huber or Logcosh as a metric.
Using MAE with modern programs is trivial. All programs have it as well as other metrics that are well suited for outliers and distributions with fat tails.
MAE finds the median ,Yuval, which you have been a fan of at times. I do not know if it is your preferred statistic now.
Logcosh is magical. Huber probably works as well as logcosh.
Other things like bootstrapping can be used with ease because of modern computing. This is clearly an effective and well accepted method. But I cannot provide a complete list of methods here.
We need to stop debating 1970s ideas here on P123. It really is not a debate for anyone who has opened a textbook or taken a course. We can pretend that there are no modern ideas and no modern computers but that is obviously just a fantasy.
So I guess some would claim that we have been debating this for 50 years and no one has any solutions: Uhhh…okay.
I do know and am encouraged that Marco is looking at different ideas like DataRobot and perhaps Amazon AWS. I think it would NOT be hard to use Python within P123 but there are some specialized thing like possibly using GPUs (graphics processing units) because of all the vector programing used for machine learning. CPUs with MMX, SSE and AVX instruction sets do work well. Not as good as GPU but they do work well. Open source is good perhaps, but who do you call when programs conflict (e.g., Anaconda with Tensorflow)? Probably someone picks up the phone when Amazons AWS’s calls (or DataRobot’s calls) or perhaps they have a programmer that changes the Python programing to suit them.
So I am not suggesting that there should not be some marketing research, calls to DataRobot and other cloud-services, brainstorming….It would be foolish to do much of anything before moving to FactSet and even then probably more information would need to be collected before closing the best path. P123 works pretty darn well the way it is.
My experience with machine learning has given me a great appreciation for how good our present system is.
But let me ask everyone a question. When Netflix sends you a recommendation do you think they are debating whether your preferences have a normal distribution? This debate is ridiculous in the year 2020.
Yuval, you said it best when discussing how machine learning is applicable to what we do:
There is no rational debate about this (with modern techniques) as Yuval has posted (mostly elsewhere).
Time keeps moving on. Many potential P123 members have not been in a coma or a nostalgic self-congratulatory echo-chamber that the P123 forum often is. Many potential members read posts outside of the P123 forum or have opened a textbook written in this century.
Yuval and Andreas, the only people talking about OLS techniques here on this forum are you.The paper that started this posts found Random Forests to have the best predictive potential—as you well know. The authors only used OLS because more modern methods can be more of a black box and can be less useful for explaining factors.
The feature-importance tool available with Random Forests was used in the paper to explain the importance of factors. Even if they did not show you this, I am sure they would have used Random Forest as well as other modern methods privately.
But again, your writings elsewhere tell me you already know this.
James is someone who has some modern ideas that he wants to use for investing. James, are you going to say at P123? If you do, are you going to bother to post in the future? I, for one, hope you can stay and continue to post.
But will you bother knowing that the P123 forum has some need to limit its discussion (only on this forum) to 1970s techniques? Knowing that P123 wants to continue to debate 1970s ideas over and over again like they are new?
I have seen Alzheimer’s patients in my medical school rotations who did better, but to be fair, none of them were in a coma for 50 years.
As the product manger of P123, I don’t think Yuval should say that the large majority (95%) of the finance profession is using statistics incorrectly. It we put that to a vote, that would mean the minority calling the large majority in the industry incorrect and that 95% of fund managers/hedge fund managers/professional investors are simply wrong.
Given that the P123 platfom currently supports risk measurments like sharpe/sorinto/standard deviation. If Yuval really think that OLS regression does not make sense, why are you supporting a platform that utlizes these measurements. You should push for revampling the site without measurements based on OLS regression and that will probably put P123 out of business since a large majority of users will stop using P123.
Yuval’s positions change with the wind. At times he seems to have a consistent view in support of statistical learning when he posts on other sites. Here it is highly variable. Possibly because he is posting as a member at times and as someone helping with product development at other times. The two cannot be separated as much as we would like to pretend that they can be.
If this is why P123’s positions shift so often it is understandable. It is also possible that the constant shift in Yuval’s positions is just because he is still learning. This is Yuval 2 years ago:
It is a credit to Yuval that he knows more than most of us about statistical learning now. I repeat: YUVAL KNOWS MORE THAN MOST OF US. I suspect Yuval’s SAT scores were near perfect. In fact, if I had to bet on a single number I would bet on perfect. No one else I know could make this much progress in 2 years.
I blame P123 if they are asking Yuval to develop a machine learning platform. No one person can do it. I CERTAINLY COULD NOT DO IT AND WOULD TURN DOWN ANY JOB OFFER ASKING ME TO DO IT ON MY OWN. I am certain they will have the sense to ask someone with some degrees (and not me) when the time comes. IF THEY WANT TO DO IT IN-HOUSE THEY NEED TO HIRE SOMEONE WHO WAS FURTHER ALONG THAN THIS 2 YEARS AGO AND WHO HAS TAKEN SOME COURSES TO HELP YUVAL. Again, this is all on P123, IMHO.
My preference would be that they give Yuval a raise for what he does and just accept that Yuval cannot do everything. If (or when) they decide to move to machine learning they need to hire a programmer or call Amazon AWS (or DataRobot).
After all “Theil-Sen estimation” that Yuval recommends is not going to cut-it. Even if it were to ever happen.
P123 has already said they are interested in what DataRobot might be able to provide. Probably they are looking into all of the options.
The constant shifting in positions coming from the Product Manager does not put a professional face on P123 as James has noticed. Good that James is still here and posting however.
I put forward my ideas about OLS measures because some of the people who were writing on this forum were not grasping what Andreas was trying to say, and I wanted to support Andreas’s ideas because I agree with them. I prefaced those remarks by saying I was not speaking on behalf of Portfolio123. If 95% of the finance world uses certain measures, Portfolio123 is not about to eliminate those measures. If we were to start using Theil-Sen estimation to calculate alpha or median absolute deviation to calculate variability, we would, as you rightly point out, alienate almost all of our users. While I would like to encourage our users to explore alternative statistical methods, I am, as I said, not an expert in them. I do devote a lot of time to exploring them, but because they’re not in common use, it’s hard.
It’s also possible that some statistician out there has written a cogent and convincing justification of continuing to use OLS methods on financial/economic data despite its heteroscedasticity, in which case I will be happy to retract my 95% statement.
I tend to agree with a lot you say. Certainly, there are problems that can be found with any statistic. How fatal a particular flaw it is can be debated. I won’t continue the debate here.
Furthermore, I like a lot of thing that you do such as bootstrapping.
But I wonder how much it does for P123 to criticize what few statistics it has available while offering only the hope of anything else to replace those statistics
If P123 (or the product manager) is going to systematically criticize everything we have available at P123 with regard to statistics then P123 needs to offer something more.
We have nothing according to Yuval right? Nothing according to Yuval who loves machine learning on other sites. Here it is just do without.
Of course, I do not have a business degree but frankly I do not get how any of this is good marketing.
If you add that Marc does not like backtesting much the marketing is probably the worst marketing for any business that is still in business, ever.
I will say that if it is in Yuval’s job description to convince people that machine learning and other statistics are good while writing at SeekingAlpha and then refuse to provide any new statistics or machine learning here at P123 then he is not paid enough.
[b]Wait, I forgot. Yuval recommends looking at the rank performance buckets and looking at the slope of the buckets despite the fact that the buckets are not linear.
But he is worried heteroscedasticity. Okey Dokey. At least we have something.
[/b]
Again, I do not have a business degree but I think it is crazy. We are going to lose James, I think. He can speak for himself on that. But I do know he does not need a lecture on OLS from P123.
Again just crazy marketing, IMHO. Maybe James will correct me on that.
The solution would be to provide a wide range of possibilities without having to pass the ideas through Yuval and to dial back the criticism.
Probably just me (no business degree).
And really I am just going to ask if he can stop this the Theil-Sen estimation every time we have an idea. If that is all he has there is a serious problem.
Says who. I LOVE backtesting and did it for years by hand before getting involved with p123.
What I don’t like is misuse of it and I have tried hard to show people how to use it well and how to align their expectations as to what makes for a successful test.
I’ve written about it in the on-line strategy course so won’t repeat here.
[b]It is also possible that the constant shift in Yuval’s positions is just because he is still learning. This is Yuval 2 years ago:
[quote]
I’m wondering if any of you can explain this to someone whose knowledge of statistics doesn’t go beyond what’s available on Excel. I’ve read this thread several times and I can’t figure out what bootstrapping is. Or what an equity curve is. Or what p-value and t-test mean. Or what an I.I.D. is. I take it R means correlation (as in r-squared)? When I look these up, I get just as confused. What does randomness have to do with P123 results? Do I need to take a stats course, or can someone explain in simple terms what you’re doing? Are you manipulating screens and/or simulations in Excel (which is what I do)? Does this have anything to do with alpha and standard deviation or is it something altogether different? And isn’t applying statistics to technical analysis like applying Newton’s laws of motion to astrology?
-Yuval
[/quote][/b]
Jim,
I am not a marketing expert either (I happened to have a MSc Finance degree) but from a marketing and investment perspective P123 should be here to offer main stream statistical evaluation metrics that is being used by 95% of the industry and not promote some alternative statistical methods. Just because the financial markets (FX/Equities/Fixed Income and Commodities) have fat tails, all the banks in the world continue to use VAR (value at risk) which assumes normal distribution to meausre their market risk as required by central banks.
Furthermore, I am not sure how the investment community will see P123 if the platform itself doesn’t believe in Sharpe/Sortino (to mesaure return) and standard deviaition (to mesaure risk) when offering it service. You are also right to point out that neither you or me requires a lecture from the product manager of P123 (whose knowledge of statistics doesn’t go beyond what’s available in excel two years ago) on OLS regression.
That brings up a good question…is there the ability to add noise into the data during the backtest process? If not that would be a very useful feature. Granted that fundamental and pricing data is noisy by nature, but allowing for a certain level of random noise generation in the data during the back test process would be very illuminating. At some level of noise every model would fail, but the ability of a model to withstand that noise would be of great value. It would rather quickly shake the paint off a model which is a result of curve fitting (especially if it just happened to pick the right stocks). I know there is the random function and I have experimented with it to a degree but it still doesn’t work directly on the data. Of course if such a feature existed, you would want to run the back tests repeatedly to surmise how much variation you experience relative to the noise.