SP500 10 Value Stocks.

Yuval,

Honestly, some pretty amazing stuff.

I just wanted to ask a rhetorical question that may help streamline what you are already doing here.

If you put these two things together is that not a Random Forest? Specifically, bootstrapping and averaging simpler models together is the very definition of a Random Forest.

I hope you are not doing any of this on a spreadsheet. With modern parallel computing you could do this 50,000 times overnight (on a personal computer).

Knowing that P123 is not planning on implementing Random Forests anytime soon this is just meant to be possibly helpful to a fellow P123 member.

Best,

-Jim

“- When backtesting, divide your universe and your time periods. If you can bootstrap, do so; if that’s too complicated, here’s what I do. I divide my universe into four equal subsets and test on each one; I divide my time period into two, two different ways, and test on each one. I end up with sixteen results for each model I test.”

I understand the testing on different time periods, but how do you divide the universe in to 4?

Thanks for the input

One of the big issues with P123 Designer Models is the startup requirement to run the portfolio for 3-months “behind closed doors” before allowing (general) visibility and subscriber signup. This is an issue because this requirement does nothing to improve the reliability or future success of the model and in fact impairs model assessment for potential subscribers. Out-of-Sample testing is simply another form of optimization. i.e. run the model OOS and if the results are poor then delete the model and start again. This is not a problem that can be overcome, it doesn’t matter how much OOS time is allotted, the problem still exists. And the big thing is that the three-month requirement works against potential subscribers because they *** believe *** that the initial positive performance is the sign of a great system.

Another big issue is the perception that DMs can be produced that will work presently and for all eternity. As we all know, or should know, the markets are constantly changing. What “worked” a year ago may not be appropriate in the coming year. While designers can revise a model every 6 months, this feature is meant for minor revisions, not major changes. And in fact, subscribers may get upset if the DM they are subscribed to takes a major change in strategy.

My opinion is that it is inherently wrong to direct subscribers to a strategy that holds 20 or more stocks and “works” over a decade or more in the backtest. The reason is that there will be very few different strategies and subscribers will be required to hold 20 or more stocks from each strategy. the subscriber would generally need to be prepared for duplicate stocks and will have difficulty determining how to handle the duplicates. The solution to this issue is for designers to offer investment strategies that are of lesser scope (number of years backtest and number of stocks). Lesser in scope generally means that there is the potential for many more strategies that subscribers can pick and choose from.

There is no law that says that investors should choose one model only and invest everything into it. The smart investors will realize that it is better not to put all the eggs in one basket. Investors are better off selecting a number of small-scope DMs with low correlation to each other.

Principles to Invest by:
1: Volatility is not your friend
In general, higher volatility means lower performance going forward, regardless of what the backtest seems to indicate. The reason is that if a stock loses 50% in value, a subsequent 50% price gain doesn’t bring the stockholder back to even. The higher the volatility, the worse the performance going forward. For this reason, it is better to hold more stocks than fewer numbers. Volatility is obviously reduced with a greater number of positions. Note that this is in direct conflict with what I said above and will say below. Please be patient and hear me out!

2: Diversification is not your friend
I believe it was Warren Buffett who stated that you should only have 10 holdings. In any case, the more diversification you have, the more mediocracy you will have.

3: Find the balance between volatility and diversification
As you can see, principle 1 and principle 2 are at odds. There is, in fact, a tug of war going on (see attachment below) that results in an optimal number of stock holdings that you can actually determine for a given portfolio based on ranking system (keep buy/sell rules out of it). I haven’t done this exercise for at least 10 years, but when I did do it I generally came up with the optimal number of stocks being 5 for a variety of Ranking Systems over a significant time period, using Sortino or Sharpe as the performance barometer.

Again, keep in mind that is is the optimum for a given portfolio and has nothing to do with the number of stocks you hold in your brokerage account, an entirely different matter. There is no reason to impose a “20 stocks or more” rule on an individual model.

4: Swim Downstream
The advantage of DMs with lesser scope is that the strategy can be articulated easier to potential subscribers without giving away the farm. For example, I offer these two DMs with clearly stated strategies: https://www.portfolio123.com/app/r2g/summary?id=1445554 and https://www.portfolio123.com/app/r2g/summary?id=1439560 .

Such strategies allow potential subscribers to assess for themselves whether or not they will perform in the future. Subscribers should always be looking to swim downstream, not against the tide. Look for DMs that exploit anticipated trends.


Volatility versus Diversification.gif

We now know why models show initially good results that decay.

THANK YOU STEVE!!!

BTW. Steve is the only person among us who has any professional training in statistical learning.

Again, thank you very much!!!

-Jim

*** Full disclosure *** I slept through university stats class. Drinking and partying were more interesting at the time. My stats capabilities are therefore weak. I did do a lot of neural net experimentation through the '90s however. My main conclusions were:

(1) garbage in- garbage out - if your inputs don’t carry a level of significance then don’t expect miracles from your predictions.
(2) The best reason for NNs is to have a semi-rational output in the face of multiple conflicting input indicators.
(3) Deep learning is bad… you don’t want to memorize the inputs, but find some prediction power.
(4) There is the chance that the NN finds hidden undiscovered complex input-output relationships. For this reason alone, optimization is of use.

Steve, spoken like a true engineer.
To this one should add “don’t be greedy”; annualized returns like +30% one can only get from simulations. Aim for models with a 20% annualized return and be satisfied making 15% on your investment.

Here’s one way to do it. Go here: https://www.portfolio123.com/stocklookup.jsp. Click “Status: Any” and press search. Press the arrow next to 100 and scroll down and click on “All.” It’ll probably take several minutes to display the entire list of 36,196 tickers, but once it’s displayed, you can copy it into Excel (you can’t download it, though) and create custom lists; you can then create as many different universes as you want with “Inlist.” Tickers change all the time so you may have to repeat this every few months.

Here’s another way to do it. In the rules for universe 1, put EvenId = 0 and Trunc(100Mod(Ln(SalesA+1),0.02)) = 0; in the rules for universe 2 put EvenId = 0 and Trunc(100Mod(Ln(SalesA+1),0.02)) = 1; in the rules for universe 3 put EvenId =1 and Trunc(100Mod(Ln(SalesA+1),0.02)) = 0; and in the rules for universe 4 put EvenId = 1 and Trunc(100Mod(Ln(SalesA+1),0.02)) = 1. The only drawback to this approach is that about half of the constituents of universe 1 and universe 2 will switch every year, and ditto with universes 3 and 4. AstTotA will also work instead of SalesA in the above equations.

A third way to do it is to use the above equations but instead of Ln(SalesA+1) use Subindustry/XXXX, where XXXX is a large number like 4763. The problem with this method is all members of a particular subindustry will be in the same two universes. But maybe that’s a good thing.

WISE words!
Greed and “great looking equity curves” of simulations are evil in disguise.

Hmm. I guess it is like a random forest, but without the trees or the objective (classification/regression). Which means it’s different from a random forest too. But that’s not a bad analogy.

And, yes, I do this on a spreadsheet rather than “modern parallel computing.” It’s not so bad; I don’t mind.

yea. and in 136 years you will be caught up to what you could have done last night. If you do many spreadsheets every single day you can do it in your lifetime: just make sure to exercise and eat right. You can get a little ahead in leap-years,

It is a very good thing that you enjoy it:-)

What I say still stands. However you are doing it, it is impressive.

You have come a long way since you responded to a post about bootstrapping in the forum (me too I hope).

Bootstrapping (clearly a statistical method and the subject of the thread) is Astrology? Definitely a change.

Personally, I had no idea of how to use bootstrapping effectively back then.

I still would not know how to use it today without our posts. Thank you for that.

BTW, BOOTSTRAPPING WAS DEVELOPED IN 1979 and RANDOM FORESTS WERE DEVELOPED IN 1995. With the right diet, I know I can learn some 21st Century methods in my lifetime.

-Jim

I want to be clear about how good of an idea this is.

Someone on their own—even with a review of the literature and other readings—coming up with the idea of bootstrapping and averaging simpler models together is absolutely INCREDIBLE!!! Full stop. No qualification.

This is what makes Random Forests popular in theory and in practice. I could share equations as to how this diversity of models (the independence of the models) is helpful. I could even quantitate how much this diversity does help. It is a function of the correlation of the models.

One need not be (should not be) tied to decision trees to use this basic principle. A Random Forest simply serves as a well-known example of where this is routinely automated. Parallel processors was mentioned because each tree can have its own (parallel) processor speeding the runtime by a factor equivalent to the number of core processors (up to 28 for a Mac Pro). But you can run 50,000 trees on a laptop (usually 4 cores maybe up to 8) overnight—the software is the good.

To show you what we are up against: de Prado has Python code that allows each THREAD which to be use is usually a lot of threads. The memory sharing can cause problems , theoretically, but not so much in practice he says. Also Amazon Big ML uses 32 cores and presumably has a lot of memory. This is not very expensive. If one wants to get it done they can do it.

This does work with a spreadsheet. However, bootstrapping multiple different models happens to requires an awful a lot of spreadsheets (arrays in memory for Python)—or you ain’t doin’ it right.

It is just me who would not want to try to do that—even if I thought I could.

[color=firebrick]Whether one prefers spreadsheets or Python, the principle behind this is wicked-smart and Yuval deserves credit for that.[/color]

-Jim

Thanks for the kind words. I have to say I got both the bootstrapping and the averaging ideas from O’Shaughnessy. After looking into it, what I’m doing is now called “bootstrap aggregating.” There are no trees involved. And I don’t see how you could work decision trees into the process. Or if you’d want to. You can do bootstrap aggregating without using trees.

What’s important to me is to come up with a ranking system without overfitting. It strikes me that bootstrap aggregating might be better for out-of-sample results than creating a system that gets 70% to 90% annualized returns (my bootstrap aggregating system gets a rather small fraction of that). But the actual procedure creates a host of other problems that I’m trying to grapple with. I’ve only begun, really, and I don’t know how it will all turn out. In the meantime, I’ve started buying a few undervalued large-cap health-care stocks, which I’ve rarely done before . . .

Also alled “bagging” as I’m sure you know.

There is also BRAGGING. Bootstrap ROBUST aggregation. You might like that. Robust in the sense that you use the median instead of the mean.

Yuval, let me just put this out there without a response. It would not be up to just you anyway, I think. There are a lot of different ways to implement some of this. More ways than you or I could come up with, in that proverbial lifetime.

But get Walter, others and new members using Python……there will be things that I, at least, could not dream of: with regard to designer models.

Just for future thought as you get a better idea of how useful you think this is.

BTW, a letter of intent (of how P123 and I both could benefit) and an NDA and I can solve some of your problems. Not that you won’t solve them on your own in your own way.

I have a model going myself: paper traded. One that uses bagging but may be different in a lot of ways.

[b]Always happy to discuss general ideas but only up to Random Forests (and analogous bagging) for now[/b]

-Jim

Perhaps a new function that returns an unique stock ID would be helpful here. That ID could then be used in a hashing function to partition simulation universes.

Walter

Delete.

Slice by date. Slice in the Munging sense.

Keeping it all PIT is necessary.

after designing SP500 10 stocks value model for around 4 years.
analyzing the model all of my model;
performance looks just matching benchmark.
and the draw down looks like can’t hold the investment during rough time.

As individual investor we are looking for better return. if we able to achieve 2X return every year consistently using 5 stocks, that is worth the effort, energy and time.
That is the primary reason we are with P123.

So, I have designed 5 stocks strategy.

=========================================
S&P500-5Value Stocks-Rev2020
https://www.portfolio123.com/app/r2g/summary?id=1590115

Jrinne my efforts are few hundreds hours to design this model; it is over last 4+ years. the universe focus remain SP500.

Thanks
Kumar



I have changed my dm model price from $50 to $25. :sunglasses:

====================================================================
S&P500-10 Stocks Value, Quality & RS
https://www.portfolio123.com/app/r2g/summary?id=1508735

S&P500-10Value Stocks - Rev3
https://www.portfolio123.com/app/r2g/summary?id=1500539

Thanks
Kumar :slight_smile:



Large cap and liquidity models are working fine with long term holding.
Here, I am attaching screen for reference.

Long term holding, Large cap and liquid models are asset protection and the return anywhere between 30% to 40% per year twice the performance of best mutual fund in the world 15% to 20% average return in 5 years.

==================================
Designer models uses 100% Quant System. Average holding time 3 months to 6 months.
https://www.portfolio123.com/app/r2g/summary?id=1508735

Yearly 20 Stocks picks systematic manual pick as best stocks using all of my skills/knowledge. 100% Non Quant System. 12 months holding.

=================================================
If you have right skills and knowledge, I don’t think Quant and Non Quant system makes any difference.

When i was working in Singapore in 2000, I argued with my team lead, i have passed typewriting in lower and higher in first class;
So, I can type faster the program / word document etc.,
My team lead said it does not important how fast you type, the important is what you type (knowledge/wisdom).


My stock market skills is result of 6+ years with P123 and
5+ years dedication with SP500 universe and designing models.
and always kept my best simulation as designer model.


Thanks
Kumar :slight_smile:




.