beyond sectors: ways to classify industries

The conventional way to classify industries is by grouping them into sectors, as the GICS standard does. I’m interested in alternative approaches. One can be found in Portfolio123’s “themes”: macro, population growth, special, financial, and innovative. You can read more about that here:

I really like this approach. But I was also wondering about alternatives. There’s a very short article on the Harvard Business Review - - that suggests we classify industries as follows:

Asset Builders make and sell physical things
Service Providers use people to offer services
Technology Creators generate and deliver intellectual property (software and data)
Network Orchestrators facilitate transactions and interactions within a network

This seems like a promising idea, but I think it might be hard to assign certain industries to one of these categories, and I don’t know how useful the classification might be. I’m also not aware of anyone who has gone to step two and actually tried to assign industries, subindustries, or individual companies to one of these four categories. I myself wouldn’t know where to begin. What category would banks be in? How about insurance companies? How about Netflix?

What I’m looking for is a way to group stocks that exhibit similar behavior based on what business they’re in that are somewhat larger than GICS sectors. Does anyone have any ideas about this? P123’s “theme” approach is great, and maybe that’s the best one available. But if anyone knows of alternatives, I’d love to hear about them.


I thought about this type of approach as well.

I have on my to-do list a project which would use the IBD / Marketsmith 197 industry groups.
These groups seem more homogeneous than the GICS codes

First challenge is that they do not reveal how they allocate stocks into each of the 197 groups. One way to do that is to buy the cheapest Marketsmith subscription and to manually export every so often the list of stocks… 197 times! (I did it once)

Second challenge is to find the historical composition of these 197 groups to be able to backtest whatever strategy (using the InList function) … at least over a few years!

If anyone has any insight / way as for how to proxy the 197 groups, I am interested.
I am also interested in the historical composition of these groups if anyone has it (feel free to email me)


Not sure if any of these qualify, but:

Peter Lynch had some interesting categories in one of his books. One model I remember he mentioned was a consolidator in a shrinking industry. A shrinking industry tends to not attract competition, and the consolidator is gaining market share/pricing power as time passes. I think I’ve also seen research that returns in more consolidated industries tend to be higher (no surprise I guess). May be a way to “metricize” that.

Lynch also mentioned that coming out of recessions it might be better to invest in the lower quality, more indebted company - as they’ve often been beaten down most due to bankruptcy risk. It’s been so long since I’ve read his books, but I think he gave examples of industrials like auto manufacturers in those cases. Maybe that’s something like a “leader”/“also ran” type categorization?

Lynch was also big on the cookie-cutter expansion type stories. Something that works and can just be rolled out to green pastures. Autozone or Taco Bell is the model. Tends to be retail - not sure if applies in other settings.

Yeah. Can also look at basics like:
a) Gross margins of the industry (high, medium, low or deciles)
b) Price to book of the industry (asset intensive or not)
c) Overall growth rates of the industry revenue per employee of the industry (so productivity)
d) Can look at total number of companies and growth in employment of industries
e) Can look at ‘public / regulatory barriers’ around industries… so, things like utilities…
f) Can look at nature of ‘disruption’ in the industries (i.e. industries with lower turnover of top 5 players over 10 year period likely have things about them that make disruption harder - legal barriers, capital barriers, talent barriers, etc)
g) WACC of the industry over time
h) Number of ipo’s by industry over time
i) Amount of M&A in industry over time

Can likely create ‘factor analysis’ to break industries / markets into 4 quadrants or multiple deciles (high potential growth&profits vs. low growth & profits) and (likely to see high disruption rates vs. unlikely to be disrupted)… and then group companies into these based on their overall profiles.

Can make a bunch of ‘2 by 2’ 4-quadrant analysis charts like this and use them to order companies various ways.

This could be done in a purely ‘math-based’ way

This is a really neat idea. If one mapped out all industries along these lines one could easily calculate the correlations between these factors and order the industries accordingly. One could then classify them into groups.

how about this.
Companies in the same industry might have similar NetInc margin+turnover+leverage, ie, the three components for their Dupont analysis would all be in the same range.
Semiconductor manufacturers have similar margins, turnover and leverage. Ditto retailers. Ditto oil refiners.
At least that is my hypothesis. I think the data and tools exist in P123 to collect and bucketize the companies, to see if it works.

I have had similar thoughts.

The problem is that manufacturing and services are blurred with no clear separation. Generally, companies provide both.

My early thoughts on classification:

(1) Input Resources

  • energy
  • basic materials
  • human resources??

(2) Business-to-Business

  • advertising
  • office products
  • outsourcing

(3) Business-to-Consumer

  • consumer staples
  • consumer discretionary
  • social

(4) infrastructure and government

  • transportation
  • aerospace & defense
  • utilities
  • internet

(2) Facilitators

  • banks
  • insurance

I really like your idea, Inspector. It’s a great start.

Good topic. I’ve been thinking about this a lot recently, specifically how a “technology” sector is basically meaningless as technology is ubiquitous across all sectors. If Disney is streaming video into your home for a subscription and Netflix is streaming video into your home for a subscription, why does Netflix get put in the technology sector worthy of a 105 PE while Disney has the 13 PE.

I believe that the GICS is obsolete, but there is a huge business feeding off of it. Technology is a good example. Some hedge funds are starting to base their classification systems on A.I. The results may not be any better, but a lot less expensive.

I propose that the most natural business differentiations should mirror the distinctions made in accounting regulations. It’s not that accounting rules are a natural economic differentiator, but rather that they can and do dramatically affect the presentation of operating results in financial disclosures.

On the other hand, accounting rules distort results even within the exact same vertical. For example, there are several ways “extractive industries” can account for PP&E under both US and Int’l GAAP. While other accounting (e.g., inventory and revenue recognition) methods distort results within industries, the latter factors affect all industries about equally.

Despite the limitations, on balance, I’d say that accounting rules exert more influence on a typical quant’s day-to-day investment decisions than any other business/economic differentiator.

Independently from that earlier thought, what about selecting candidates from within the same sector/sub-sector based on their correlations?

The rationale is to let the market do your homework for you. The more similar companies are, the more likely their stocks will move together. Selecting from within the same industry grouping likely mitigates the spurious correlation risks.

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

Hi David,

Agree. I would have said CART (classification and retrogression trees). But once you had the data in a spreadsheet you could try both and play with a support vector machine method too: in an afternoon. The problem would be getting the data out of P123 or what you did in Python (or R) into P123 for backtesting.

Nodes CAN have a lot of similarities to principal component analysis (eg both are linear weights of factors). I contend that some focused optimization of a node can duplicate the results of PCA: maximizing the variance and even reducing the dimensionality if desired. Even better would be if optimizing a node mimicked Principle Component Regression (could it?). Maybe most of us have been doing some of this this already—whether we gave it a fancy statistical term or not. Maybe we do not appreciate everything P123 is already doing.

But does P123 have the specific data we (or each of us) would want regarding the topic of this post?

You can get SOME correlation data with a custom series. You mentioned correlation data. I would probably look there too.

I don’t know if any of this helps and I do not think I will be focusing on this going forward—unless some of the problems with uploading or downloading of data can be addressed.

I was kind of skeptical of the usefulness of this idea when I read the first post in this thread. But I was reading about unsupervised machine learning today (what this is) and it is a rich topic. And there are clear examples where this method can work.

Just a few random ideas on an interesting topic.


Well, both businesses are now in the Communication Services sector, so that particular problem has been solved. I thought the revisions to the GICS system in rethinking this sector were very sensible.

In my opinion, both Netflix and Disney should be in the Consumer Discretionary sector.

Visa and Mastercard are in the Tech sector.

Here is something to put my first suggestion into concrete terms:

These delineations are for US GAAP – IFRS GAAP has similar standards, but there are also notable differences.

Aside from these topics, I think there are opportunities for consolidating similar groups and then dis-aggregating important differences within sector groups. For example, I think I would either drop or consolidate accounting plans. In addition, I would–for financial statement screening purposes–create four possible groups for extractive enterprises based on how they capitalize exploration and drilling: those that capitalize the fulls costs of exploration/development and those that use successful efforts to capitalize exploration/development. While these things might seem trivial (and, to be fair, they are trivial over full economic cycles), they make big impacts on operating results over quarters and years.

Thanks, accounting. NOT.

I just wanted to say that I’ve spent the last week or two doing exactly this, with some fascinating initial results. Your suggestion of k-means is perfect, by the way–I tried a lot of other clustering algorithms, but k-means worked the best.

I went into this expecting an affirmation of GICS sector classification, and that, for example, health care stocks would correlate well, as would tech stocks, etc. I found the opposite. Health care stocks ended up in four different clusters. The only GICS sectors that remained intact were energy and staples.

Here are my results, in order from relatively undifferentiated to strongly differentiated. I’ll be writing an article on my blog and on Seeking Alpha outlining how I arrived at these industry clusters and offering some additional thoughts, but it’ll take a few weeks.

primary: staples, insurance, electricity, gas, water, road & rail, health care providers & tech, packaging
secondary: chemicals, defense, machinery, conglomerates, trading, professional svcs, health care equip & tools, internet retail
basic: paper, wood, clothes, stores, drugs, infrastructure, auto parts
complex: it svcs, software, wireless, non-traditional utilities, construction
service: commercial & consumer svcs, finance, air freight, building and construction supplies, leisure
edge: biotech, semiconductors, computers, electronics, media
credit: banks, real estate, distributors, cars, furniture
speed: internet, social media, telecom, comm equip, airlines
earth: energy, metals, eletrical equip, marine shipping

Very Cool!!!

Good to hear. And thanks for sharing this! I’d be interested in hearing more about time-periods and sampling frequencies you used in your analysis.

I used the maximum time period for P123, from 1999 to today, buying all the stocks in an industry above a certain liquidity limit and rebalancing to equal weight annually. I then used the correlation of daily returns for my correlation matrix.