Marco,
Thank you.
P123 is a wonderful tool! It works well exactly as is. Machine learning does work better for certain situations. For example, the linear (and normalized) equations now used for the ranks are one reason model performance drops off (generally) with models that have more than 5 stocks (I would love to show you this some day). But it is a fact that P123 cannot do as well as other methods with more than 5 stocks when the data is not linear.
This is all to say P123 works well. And if there are any limitations one can work around most of them—at least to some extent. For example, run three 5 stock models instead of a 15 stock model.
Getting to the point. All of Python’s useful features could be unleashed if we could have access to the m x n matrix (also called an array or DataFrame) that you (must) create when you run a sim. The array would have a hierarchical ticker and date row index. The columns would be the returns over the rebalance period (e.g., next week) and all of the factors or functions in the sim. The data points would still be the rank.
Question: Isn’t this matrix sitting in memory or on the hard drive at some point during a sim run? If not can it be created easily?
If so, why not move this matrix over to a PC loaded with Python (all Open Source) and allow us to have the most advanced quant system that one could create? With no download to the user that a data provider could object to.
Outline of points:
-
Python is free. The extensive Libraries are also free and Open Source.
-
The Matrix already exists or may not be hard to create.
-
It could be done on a laptop (for one individual). I defer to you as to how much you want to scale this up. But I would pay for a PC to get this started. And what I really mean is you could probably buy the equipment with reasonable user fees. Equipment that can do a lot with a little: one server and not that big of a server, I think.
Summary: The overhead many not be big.
How useful really?
First: not like Quantopian. Quantopian is extremely limited FOR WHAT WE DO HERE. P123 is better even without the Python. Perhaps we can take this as a given for now.
There is a huge body of people with machine learning skill out there.
MARC IS A MACHINE LEARNER. He really is. The econometrics he learned when getting his degree is a type of machine learning. It is also called multiple regression.
RedShield and pvdb are two P123 members who have already explicitly expressed interest in doing multiple regression. RedShield ran a hedge fund that used multiple regression as its quantitative method.
Maybe not Marc but everyone with a degree in Finance would want to try an econometrics model.
But there is an entire world of machine learners out there who use other methods. Tom Yani another P123 user has used BigML to run Random Forests, which is a common and relatively easy machine learning method. RedShield has run some Random Forests.
There are a few million people who run (and believe in) Random Forests. If they invest in stocks they will want to check it out. Where else could they do it? Nowhere else.
There is a huge body of machine learners who have gone beyond multiple regression and Random Forests. They may be in the Silicon Valley, work at NASA, work in Universities, work for GOOGLE or FaceBook. If you count all of the technophiles in Europe and Asia then I believe you have a huge market. They will want to run Support Vector Machines, Kernel Regressions, LASSO Regression, Ridge Regression, LOESS, MARS, EARTH, Random Forests, non-linear regressions, polynomial regressions, splines, C4.5,……even Deep Learning. There are a lot of people that do this. I see them on the internet.
All of the above algorithms can be run with Python and the libraries. I have done all of them (except deep learning) on a laptop (some with R). We do not need to understand them to make them available. Only Python and the data we already have would need to be managed.
I can show you that some of the methods above work well.
You probably already know that PANDAS in Python was created by AQR Capital Management for their use. De Prado from AQR Capital Management has a book about how this can be used by retail investors. It is a serious tool now available as Open Source.
Bottom line: one matrix (per customer), one PC (possibly scaled up) and you have a HUGE market. And it is state of the art quantitative investing.
Ease of use:
For me the hardest part of Python is the data wrangling. After that the programing is not hard. The scripts are extremely short. People could share the scripts. Maybe P123 could write a few scripts.
I had one Fortran Course in college. I did audit a DOS course after I got out of Medical School so I could use Windows. If I can do it anyone interested in quantitative investing can do it: even if it is a hobby—as is my situation.
So that is it. If that m x n matrix is sitting in memory after running a sim you could reap a big profit from it, I think.
If that m x n matrix already exists you should use it.
What is a pitch deck? Should I get one;-)
Thank you Marco. I would love to answer any questions. Maybe I could even show a few examples through email, over the internet or even in person.
-Jim