Jonpaul,
I do not want to dominate this discussion. Walter may have a method. I am not sure I fully understood what he was saying. There might be methods none of us have thought of.
But also you put a lot of work in on this and I hope we can help prevent each other from making mistakes.
I asked ChatGPT 4.0 about this with an open-ended question that I hope was not too biased and I hope was an accurate summary of all of our problems now: “Okay. Let me expand and get a clear answer from you about whether you would recommend anyone investing with this method. A person has 24 years of data. Suppose he has 30 factors and can get a z-score on each of those factors but does not have access to the raw data. They do a multivariate regression knowing about standardized regression. They want to then make predictions based on the z-score but have no knowledge of the mean and standard deviation that was used in the calculation of the z-score in the training data. AND it cost money to repeat the process with 24 years of data at each weekly rebalance. What method do you recommend, if any, to get a z-score that can be use for prediction.”
A: “In such a scenario, without knowing the mean and standard deviation used to calculate the z-scores in the training data, accurately predicting future values becomes highly uncertain. It’s generally not recommended to proceed with investing based on this method unless the original scaling parameters can be obtained or reliably estimated, as the lack of this crucial information significantly undermines the model’s reliability and potential investment decisions.”
I then tried to lead it toward you solution for estimating “the scaling parameters”—again, hopefully without too much bias in my question.
Q: “Do you think overlapping the training data and the data to predicted by a week or more be beneficial?”
A: “Overlapping the training data and prediction data by a week might help in aligning the z-scores to some extent, but it’s not a guaranteed solution, especially if the statistical properties of the data change over time. It’s a method that could potentially be beneficial, but it comes with uncertainties and should be approached with caution.”
Maybe it is wrong for me to just copy and paste ChatGPT but it does write pretty well and I was already committee to its answer.
I would have just said remember:
-
That the z-score is calculated independently for each factor
-
One week of data is not a very good sample. And a year might not be very good for some markets either (e.g., 2008)
-
The mean and standard deviation get recalculated each time you do this with new data WITH NO MEMORY OF ANY PREVIOUS CACULATIONS.
BTW. P123 is probably being very careful to make it so you can never calculate, know or accurately estimate the mean and standard deviation. Otherwise, freshman high school algebra would allow you to calculate with 100% accuracy (or high accuracy in the case of a great estimate) the raw values for all of the data on P123’s site.
I think that is part of the problem you are having. I respect P123’s need to make it hard on you to calculate the raw values while working to make it possible to “decode” the data internally and use it (with P123’s help). They will never, knowingly, give you a way to decode it on your own unless their contract with FactSet changes.
Now, at least I understand why they have to save the mean and standard deviation internally and not give that to us. It does seem P123 understands the difficulties and has considered some solutions.
I am long. But this is not a trivial thing for us if we want to use it to invest money. We probably need to get it right. And I know P123 has spent a great deal of time and effort working on this without accidentally violating its contract with FactSet.
Jim