DEVASTATING DISCREPANCIES?

RE. The missing SPY dividend on 12/15/2000, that was a Friday. You original port sold the position on 12/18/2000 , the following Monday. It should have received the dividend. We’ll investigate that too. THanks

Nevermind the missing dividend. The original portfolio was holding SHY, not SPY. The tickers are so similar.

We could not pinpoint a precise reason why the re-run is different. Since you are only buying one ETF ( long, short, ultra short, etc) zigging instead of zagging can cause a large overall differece. The reason why I don’t think there’s an issue is this: the entire difference basically boils down to two instances around 2008 & 2015, the two big crashes. Every other time the system is very similar.

A possible explanation is that there’s heavy usage of EMA crossovers as signals . Any small perturbation in either some code tweaks or data fix could cause an OFF vs ON signal of the crossover.

It’s also very easy to overfit with P123. Just re-run with small, incremental changes until the best looking chart is achieved. With an over-fitted system any perturbation will give you a worse re-run.

Here’s the image of both runs showing 99% similarity except in two spots.


Thank you for spending time on this, Marco.

“We could not pinpoint a precise reason why the re-run is different. Any small perturbation in either some code tweaks or data fix could cause…”

There has been a lot of (sometimes divisive) discussion about how to avoid curve fitting of in-sample data in this forum. However, I wonder how much of the in-sample/out-of-sample differences we see are caused by these two items (code tweaks or data fixes)? My guess based on this experience is A LOT. I never expected there to be no differences when I re-ran older portfolios, but if my portfolios can lose as much as -72% total return when compared to an original portfolio created only a few months ago, it seems like it may be impossible to count on future results (or even past results for that matter) being what we expect (and perhaps not even close, depending on the portfolio’s strategy, universe constituents, number of holdings, etc.).

This is the reason I have always advocated for re-running, analyzing, and perhaps tweaking portfolio systems every six months to re-assess expected returns and subtly re-configure if necessary. First off, something is ALWAYS changing in the highly dynamic financial markets. Sometimes these are significant, even generational regime changes, such as the outperformance of small-cap stocks in the 1990s-2000s morphing into an outperformance of large-cap stocks in the 2010s.

However, sometimes it’s not even a big market-wide change that occurs. As I have learned from this experience, sometimes it is minuscule, P123-wide data perturbances, such as the code changes (I think probably related to the new “Rebalance/Reconstitute” module) that has devastated the work of our team and which is going to force us to reconfigure the ETF portfolios that were just built a few months ago.

Alas - as they say - so be it.

Chris

I see also some discrepancies in simulation. This is an important issue.

I encounter a problem during simulation.
Some stocks which use to be in the Holdings list and Recent Trade list, sometime after rebalance disappear from both list without a sell order. Usually that happens when particular stock does a plunge of 3-4 %.
Worst is that the statistics does not indicate that, contrary it show an increase in return.
It looks that may happen when a data correction for the stock occurs or a software engine upgrade.
I think that that the statistics should reflect those situations otherwise the simulation results are MISLEADING or it should be parameter to force the simulation statistics to do it in both ether way.
Does anyone have a solution for this ?

I actually like these discrepancies, to tell the truth. If my results are changing somewhat drastically from day to day because of very minor changes in data, then I know not to place any faith in them, and it’s much healthier for me to abandon that approach and move on to a simulation that will exhibit more stability. My real-time results will be better because of it. I have also seen lots of very minor changes in old data recently, and this has created some extra frisson in my backtests. It has taught me not to focus on the little differences between one test and another and to pay more attention to the larger picture. If we can come away with any lessons from Chris’s unfortunate and frustrating experience, they might be a) since simulated results have very little to do with real-time results, a large difference doesn’t really matter–what matters is how likely, on the whole, the strategy is to work in the future; b) a simulation whose results can be easily perturbed by very minor data changes is inherently unstable and perhaps needs to be rethought to increase its stability; c) while minor data changes may cause large changes in CAGR, they won’t cause large changes in median returns or in robust regression analysis, so using those to measure performance might be wiser than just looking at total returns.

. . . . “Like” button clicked.

Yuval,

Yes, detecting brittle systems is not usually recognized as fortuitous, and that’s unfortunate.

I like to randomize my results by scaling model items with (1+(0.20*(1-random*2))). I’ve done that with ranker items and model rules. The new model is then run multiple time via the optimizer. Usually I just collect annualized return data. If a model can’t survive small model perturbations, then it probably won’t do well out-of-sample. At least, that’s my thinking.

Attached is a plot of AR for a model where the sell rule was modified to RankPos>80*(1+(0.20*(1-random*2))) The model was run 200 times - but don’t tell Marco, he would probably kill me for using so many resources. However, from the results, I think the model is probably not bad.

Best,

Walter


Me too.

First, I checked the transactions and holdings of one of my stock sims versus the port today. I did not see a problem. There were few discrepancies. Few enough that I did not look for an explanation. If I had looked closer there is a good chance that I would have found that all (or almost all) of the discrepancies were due to changes I had made: e.g, manual cash transactions.

I consider that a good thing. Thank you P123.

Note: I am always in favor more accurate and more timely data (period). If a sim does not perform as well with better data: good to know.

When it comes to random changes there is definitely a place for discrepancies in sims (as discussed above) but I would like to keep control of the discrepancies and have a handle on just how random things are—as Walter has done.

Ports are little different than sims for me. If I wanted to have random results with real money I would go to Vegas (rather than staying here with P123). I’m not looking for any more randomness than I am already creating (unintentionally) in my models.

But it is a free country and people can make their ports as random as they want.

-Jim

I’m sorry, but have to push back a little on the implication that the effort put into these portfolios is something less than professional - that I should embrace BAD DATA as a good thing. The two people involved with the portfolios discussed here (including me) have a combined quarter-of-a-century experience in Portfolio123 system design. That means we joined this website shortly after Marco created it.

We are both aware of the inherent challenges faced by strategy designers and the mindfulness that one always must maintain to avoid trip-wires along the way that can make in-sample results far different from out-of-sample results.The two people involved with the portfolios discussed here (including me) have a combined quarter-of-a-century experience with Portfolio123 system design. This is not about nit-picking or passing blame for a lack of repeatability. It is about what is likely a coding error in the bowels of the data engine - something that can be corrected. Please read on to understand why I believe this…

As I discussed in detail in my earlier posts in the other thread, many of the figures generated by the P123 system are flat INCORRECT. I stand by that statement. Do you really want to abandon an investment approach (as Yuval suggests) when P123’s system is giving you bad data? Wouldn’t it be better to ask P123 to fix the problem so we can build strategies with accurate data?

I was trying to prompt the P123 staff to sort out these errors. Apparently, not many people paid close enough attention to the details of all the investigative work I did. If they had, they would see that the data is ERRONEOUS. I don’t believe we can successfully develop useful strategies with BAD DATA. You know the old acronym: GIGO (garbage-in, garbage-out).

Rather than make claims, I will (again) show a handful of incidents and allow the data to speak for itself. Hopefully, my point will get across this time. For simplification, I re-ran the same portfolio again today (portid=1509975) with $0 transaction costs and 0% slippage. The strategy is very conservative and uses just two, long ETF assets, the S&P 500 (SPY) and the 1-3 yr Treasury (SHY) as a proxy for cash. The sim ran from 01/02/2000 - present. The “Price for Transactions” tab is set to “Next Open,” as shown in this graphic:

The Transaction results of the Sim are still INCONSISTENT WITH FACTS regarding what transpired in REAL TIME. Dividend amounts are still incorrect, but I will focus this analysis on the INCORRECT PRICES quoted to purchase the equities because the many thousands of dollars in difference-per-transaction cause dramatically different results than the expected results had the simulation used correct data.

Specifically, I will focus ONLY on the listed purchase price for the S&P 500 SPDR (SPY), one of the most-traded securities in the world with more than 60 million shares traded daily. SPY is not a thinly traded penny-stock with questionable pricing. The table below shows a screenshot of the Transactions in the first two years after the portfolio was launched, from 12/31/99 - 08/05/02. I have identified five instances of purchases of the full amount of shares of SPY (not the small-but-pernicious “Buy-Sell Difference” transactions that persistently haunt our portfolios). Outlined in red are the Transactions in question:

Recall that I set this simulation to use Opening Prices (without commissions or slippage) for this exercise. The table below shows the transactions as listed by P123 in the second column, accompanied in the third column by the CORRECT OPENING PRICE for SPY on these dates. Look them up for yourself if you question the veracity of this information.

The fourth column shows the percentage discrepancy between the actual price and the prices listed by P123. The shares then multiply these figures, and the calculations for P123’s INCORRECT amounts (sixth column), then the CORRECT AMOUNTS (seventh column), and the far-right column shows the difference in dollars between P123’s incorrect quantities and the CORRECT amounts. As you can see, it would add up fast! And these numbers come from early in the portfolio’s history - before the number of shares increase 10 or 20-fold!

>>>>Interesting sub-fact: For the first two purchases of SPY, P123’s calculation is erroneous (even using the incorrect price). For example, the opening purchase of 687 shares x $145.44 = $99,917.28 while P123 lists $99,915.56. I didn’t look further, but I wonder how often miscalculations like this occur? It seems as if computers should be competent at the basics of multiplying two numbers if the code is carefully crafted.

CHANGE IN THE DATA: I did not go through all 250 transactions to see when they finally became correct, but at the top of the Transaction page, the final three Opening Prices for SPY and Total Amounts are CORRECT. This table below shows the comparison data:

This screencap of the Transactions in 2016 confirms these numbers:

It is evident that P123’s data engine is capturing INCORRECT PRICES for transactions early in the run and then somewhere along the line, in the course of the last 17 years, it gets back on track. I hope that this analysis is taken seriously and P123 management will take it upon themselves to identify the bug and correct this error. I do not see why members, some of whom have paid many thousands of dollars over the last 13 years, should have to consider and compensate for these fundamental errors when designing and using portfolios. I would prefer to introduce random perturbances at my discretion to test the robustness of my investment systems after they are developed, not always having in the back of my mind that my results aren’t correct from the start!

Don’t get me wrong; P123 has - by far - the best product available in this niche of the investment market. The product we get per dollar is an incredible bargain. I am just advocating for accurate prices if it is available. I suspect that what I have identified here is probably a bug in the system. The Opening Prices are incorrect at the beginning of the portfolio, but then are correct by the end. To me, that says there is a line of code that is amiss somewhere along the way. I am trying to assist P123 management as well as my fellow users by pointing out this issue so that something can be done to find and correct the source of this error so we can obtain greater accuracy and repeatability.

The investment business runs on numbers. The only other areas of life that use figures more actively are perhaps physics and, well… baseball. Because of the number-intensive nature of the investment industry, I feel that we SHOULD be able to work with CORRECT DATA - at least the correct prices per share of stock - when we pay a sizable amount of money every month for this service. It doesn’t seem like too much to ask - or maybe I’m wrong. What do the rest of you think?

–Chris

P.S. - I just discovered with a little more investigation of the prices of SPY, that the $ per share used by the P123 system in the early years are from the CLOSING Prices for the security - not the OPENING Prices that I set. Then later in the run, it is accurately using OPENING Prices. This makes me almost SURE this is a coding error that can be corrected - it seems less likely that it’s a quality-of-data issue from the vendor.

I checked the prices of the holdings in this portfolio when it was built several months ago to confirm to my satisfaction that the Sim was correctly using Opening Prices for purchases and sales and that those Opening Prices were correct (they were).However, checking these prices reveals that many (if not most) prices are incorrect. My guess is that something went awry in the coding when adding the Rebalance Module recently. I believe this problem is fixable!

I have to agree with Chris. There needs to be a distinction here between repeatability and good portfolio design. One can improve portfolios by introducing randomness as Walter has done. But lack of repeatability is another beast altogether and should not be twisted into a good thing, tossing it back into our faces. Apart from the fact that it diminishes the platform’s reputation, it severely impedes our ability (P123 and user) from uncovering real bugs. And before the argument pops up again. I’d like to head it off at the pass: There aren’t multiple versions of the past. There is one and only one public version of the past and that is what drives price action.

Take care
Steve

I’m 10 years on this platform and I can’t say enough good things about it. There is serious value here. If you want true institutional data you can jump on Compustat at $24K a year. Even then there will be data anomalies.

This thread is about price data and dividends. If we were talking about major factor discrepancies, earnings, etc, then this would be a huge problem. But we are just looking at price.

P123 buys the data. I’m sure they do what they can to scrub it but in the end any platform using Compustat is in the same boat.

I am in the camp that good price/dividend info or not, if you build a profitable system, regardless of pricing errors, that performs well in hindsight, it is likely to perform well in real time.

This thread smacks of curve fitting…

Lets frame this in a form that P123 understands… dollars and cents. I am an affiliate for P123 and bring in P123 subscribers, albeit a (very) small number. It is plain to see, even from my modest capability as an affiliate, that there is a correlation between posts describing unrepeatable simulations and the act of subscribers un-subscribing. Put yourself in the shoes of a new P123 subscriber, paying hundreds of dollars a month based on claims of point in time data, and so forth. As a new subscriber the last thing I want to see are posts describing lack of repeatability. This destroys perceived credibility which has the potential of spreading across the internet, deserved or undeserved.

Why do long term P123 subscribers feel the need to come to P123’s defense? Of course you are getting value or you wouldn’t be subscribing. It isn’t an all or nothing argument. My point is that repeatability should be construed as constructive criticism (sorry if it is being interpreted as negativity). P123 has come a long way, particularly in the last 2 years cleaning up issues, but there still are issues that need to be addressed if they want to go mainstream. One of them is repeatability of results, that is if they want to be perceived as head and shoulders above the competition. That should extend not only from sim to sim but sim to port and designer model. The same entry/exit price should be implemented, same slippage, same stats for liquidity, etc.

Happy trading.

Hi Steve,

Let’s make the issue of repeatability more concrete with an exercise.

You mentioned in another thread that P123 didn’t have opening SPY prices prior to 2004 - or thereabouts - and as a result P123 used closing prices for transactions.

Now, if P123 were able to backfill SPY Open/Close/High/Low prices, would that be a welcomed change? It would probably alter the results of many simulation. Would that damage the repeatability “standard”?

I guess I’m unsure about what you mean regarding repeatability.

Best,
Walter

Walter,
I look forward to Steve’s response. I hope no one minds if I add my comments.

If the data accuracy is improved I am all for it. It is not clear to me that this is what the original post was about, however. I am not sure whether data improvements caused the changes that Chris originally posted about.

My problem would be with those who think random change (that is not purposefully done by the P123 member) is a good method for finding overfitted sims and that there are not better ways to do this.

With regard to your example, how could switching between the 2 randomly–without telling the P123 member what is happening–be a particularly good way to select sims? Who knows how long it will take to redo the sim and find the change and the overfitting? How often will it add just the right amount of randomness to make it all perfectly clear? Random price changes in SPY has no potential for helping me with overfitting in my stock models and I do not trade ETFs. So, completely useless for me—and I suspect for many—as a detector of sim overfitting.

This is where it is potentially a serious problem: if a change affects sims it can (and often does) affect ports without the member knowing.

What amount of randomness added to your ports is a good thing? In fact, randomness added to ports has no potential to be beneficial and is always harmful.

How much randomness will you ask to be added to your ports in the next feature request?

Again, maybe not a big deal but not something to ask for in your ports unless you just like gambling. It fact, it is something to be avoided and should be brought to P123’s attention when it occurs, IMHO.

-Jim

Hi Jim,

I’m not really following your second paragraph. I’m going to have to think about it some more.

But I do agree with the concern about how ports may be affected by system changes.

From my POV, P123 is providing a model and that model has components of varying fidelity. Some parts, like asset price, have “high” fidelity - but not 100%. And other parts, like slippage, have “lower” fidelity. So what is supposed to happen when P123 make a change that they expect will increase fidelity and yet impacts some models negatively? And what does that say about the broken models? And what should and can be done with any affected live ports?

One my oldest live ports is almost three years old and its resim shows almost 100% agreement with the LP OOS results. I think this not an uncommon experience. Let’s not let a few outlier models hinder platform improvements.

Best,
Walter

Walter,

Thanks! And sorry. That paragraph could have been more clear and shortened to: I agree with your example and you make a good point.

With regards to the rest of your last post: I will always be for improving data (period). I just do not want to pretend that random, unintended changes are good for sims or for ports. I do not even think that can be avoided. It’s just ridiculous to embrace it—like saying getting tuberculosis is good because it reminds you to wash your hands.

-Jim

Hi Jim,

Detecting overfitting is a bear and randomizing data probably isn’t the best method. But I understand the philosophy; brittle systems fail under small changes. I’ve seen systems like that fail b/c they get “locked” into a bad state and recovery takes too long. Would randomized inputs have caught that? Would a randomized model have caught that? Dunno. There’s so much to explore with P123’s help.

Best,
Walter

Walter,

Let me just thank you for your post on a rational way to randomize things that will probably be helpful to a lot of people! Members will probably be watching and thinking when they use this method. They will probably learn something with immediate feedback and a quick change may result. They can adjust and change the randomization until the sim fails (without financial harm). That is how to test for robustness. And it will not creep into the port!!!

BTW, thanks for sharing your results (as well as the method) and cool histogram in that post. I do not recall my Excel histograms looking that good. I know my R histograms do not look that good.

-Jim