How do you go about assigning a stock factor weight to a rating system without over-optimizing the rating system?
I have tried slightly different approaches:
- distribute equal weighting to all factors
- See which factors have historically given the best historical return on larger portfolios (30+ stocks) for longer periods. In order to give those with the best historical return the highest weight. Then tested them against different portfolio size, indices and periods.
- give the individual stock factor a weighting in relation to whether it constitutes a Value, momentum, quality or growth factor. So that each of the known anomalies does not make up more than 25% of the total weight in the system?
What are you doing?
So to question two, what is your experience between the backtested results and live trading.
I understand that most do not experience that live trading gives the same as a backtest, but what do you expect? That it gives 70-80% of the results in a backtest over time? or do you think that the backtest does not say anything sensible about the future result?
Factor weighting is a tricky subject. I really don’t like saying “it depends”, but it really “depends”. 
That said, here are some thoughts:
- Like you suggested, how persistent is the factor? Some factors are cyclical (i.e. value and growth), others are more persistent over time. Others don’t really add much in terms of absolute return, but can help with limiting drawdowns (i.e. quality) during a bear market. These factors behave differently for large caps vs small caps as well, and behave differently when combined compared to in isolation.
- Generally “off the shelf” and well documented factors tend to lose alpha over time, but are still important in a strategy. Those less common factors, if tested show signs of persistence, could be weighted higher.
- Within cyclical factors, I generally weigh more equally.
- Also look at the change in the value of a factor for a company over time. Instead of looking at just absolute value of profitability, are margins improving or deteriorating? Sometimes the change in the factor is more important than the magnitude. If persistent, consider a higher weight.
- Or is it a high margin business in a high margin industry (I’d prefer a higher margin company outperforming its peers).
- Over the time period we have available from P123, 1999-today, there are some periods where specific factors did incredibly well. Small caps and conventional value/growth generally did extremely well 2002-2007, post 2008 crisis, and then 2020-2021. If you weight these factors too heavily that worked then, your results may be too skewed. There is no telling when we will have our next “small cap renaissance”
As market conditions are always changing, it is difficult to capture the same backtest results in realtime. I have two opposing experiences in this regard.
- In the realtime period 2017-2018, most of my strategies were too highly optimized for the period 2002-2007, and fell apart in realtime.
- I took a break from the market, and re-vamped my strategies. For the last 2.5 years, many of my strategies have had similar results to the backtests.
Hope that helps.
Cheers,
Ryan
Thank you for a very good feedback. I also see your point - “it depends”, but it really “depends”. I also see that tested against different indices, different markets and different market caps, it can give very different results in a rating system.
But I will follow up with;
- how many factors do you usually have in your rating system?
- the ones that weigh the most in the system, how much weight do you usually give them?
- Do you tend to use rules outside the rating system, or do you let the rating system alone pick the stocks?
I have two main systems that I use, with variations for both US and Canadian markets, that are “all season”. I use other systems to a lesser degree based on how suitable they are for given/potential future market conditions.
Both of these main systems target small/microcaps, but take different approaches based on your questions:
-
System #1 - small cap, all round;
-
System #2 - small cap, focus on growth
-
How many factors in ranking system?
System #1 - 50 factors,
System #2 - 25 factors
-
Weighting -
System #1 - with so many factors, most end up receiving similar weight based on each group (value/growth/etc), however very slight tilt towards value/growth, less to quality
System #2 - top factor group ~ 25%, remaining groups equally weighted (~ 10% each)
-
Rules outside ranking system -
System #1 - include rules in universe (liquidity, limited industries, etc), no buy rules; only 1 sell rule based on rank;
System #2 - keep universe “as is”, include several “buy” rules, sell rule based on rank/liquidity
Cheers,
Ryan
It seems to me that if you 1) Stick to ranking systems and avoid buy/sell-rules, 2) Only test things that makes sense, and 3) Test really broadly (use as much data as possible), it is almost impossible to over-optimze, in fact I think it’s a good idea to optimize as much as possible.
The most important trick to use during optimization is to test your ranking in many independent subuniverses using the mod(stockid)-trick (or perhaps using the random function). This enables you to actually use the full dataset. Running a single backtest with 20-30 stocks is IMO a bad measure of how skilled your ranking system is. In addition to the subuniverses, you should also use the Canadian dataset and the European dataset.
My current workflow is to make a small change to the ranking system, then perform a wide variety of test (rank performance, large (>100 stock) rolling screener, many small (20-40 stocks) rolling screeners in 6-10 subuniverses). If the results look like an improvement, I use the optimizer tool with 6-8 subuniverses and as many time periods as possible, and look a the results in Excel using the trimmed mean function. If the result is an improvement, I consider the change to the ranking weights to be an improvement. I then make another small change to the new ranking system and start from the top.
(EDIT: Actually my real workflow is a bit more involved, I use the API to automate and extend many of the steps above, but this is roughly how the method works.)
My experience of the live performance using this method is very good, the largest downside is that the procedure is quite time-consuming.
Thank you for your really helpful and enlightening remarks, Ryan and Test user, thank you.
I’d guess that most of us design our own ranking systems, but based on your own experience and seeing how the public ranking systems on p123 are designed, are there any publicly available systems that you think are good and can serve as a good starting point for what the rating systems should look like?
I find the Core Combination P123 ranking system a good starting point.
https://www.portfolio123.com/app/ranking-system/354585
This system includes several factors for each of the main factor groups. If I’m testing a new timing strategy, I often use this ranking system as a broad check to see how stocks have performed.
One of my main systems is similar to this, but uses some different factors in the factor groups.
Cheers,
Ryan
I haven’t looked closely at the public rankings, but of the ones I’ve seen, Yuvals “Factor Zoo” looks like a good starting point. Add the “Core: Sentiment” ranking system to it, and perhaps stuff like: Free cash flow to assets (Netfcfq/asttotq), price to sales (try sorting by industry), Accruals (“MScoreTATA”), price to book, margin stability etc. For many more ideas, take a look at Yuval’s blog (f.x “My Top Ten Factors–for Going Long and for Going Short”).