Using Stock Clusters

2011 May 23

The data here is from the spring of 2003, but the method of analysis bore fruit and is worth repeating.  This traditional method of diversification can be improved by analyzing price movements over time to help pick out the most related clusters of stocks so that you don’t buy too many from the same clusters.  And of course, very tightly related pairs of stocks are candidates for long-short pairs trading.  The amazing increases in technology and research oriented stocks sharply reveals the anatomy of the stock bubble that ended in 2001.

Hurrah for Differences!
Wilcox, April 2, 2003

Many investors who made great returns picking stocks before 2001 now seem ready to take up risk management but don’t want to invest in index funds.  One sensible approach is to diversify your portfolio across the major stock groupings and focus your stock selection efforts on finding good stocks within each group.

Technology & Interest-Sensitive Return Differences

The chart above shows how differently two of the most important groups, technology stocks and interest-sensitive stocks, have behaved over the last 10 years.  I’ll return in a moment to how I unconventionally defined these groups, and where the data came from, but for now, focus on their potential for diversification.  Cumulative returns for both have grown dramatically, but often at different times.  For example, in 1999 technology stocks were still headed up, while stocks in the interest sensitive group, which includes banks, housing-related stocks and some other businesses that do relatively better in economic recessions, headed down.  On the other hand, in 2001, as recession perceptions set in, technology peaked and turned down, while lower interest rates greatly stimulated housing and many other stocks in the interest-sensitive group.

One could argue that it is better not to diversify, because over the last decade the technology stocks did better than all the other groups that I will show.  Keep in mind, though, that the future may not be like the past.  This group is more volatile as well.  Finally, there is a survivor bias in the data — stocks had to be around for the full ten years to be included, and the technology group shown may not be representative of the Internet stock that crashed and burned.

I recently did a statistical cluster analysis of the monthly returns of stocks in the Value Line database over the last decade to let the data tell me what the real groups might be.  Of course, this procedure, since I only had 120 observations of the over 1700 stocks that were present during the entire decade, can produce some fallacious groupings based on coincidence.  However, the procedure I used produced a tree diagram of the market that was sufficiently similar to conventional industry definitions to convince me that the method could find real structures.  It placed every stock at the twig ends of variously sized branches.  Generally, the fewer branch points on the tree that one traverses between any two stocks, the more closely are they related.  The information contained is extensive, and I will touch on only some highlights here.

High-level Groups

I paid special attention to the sixteen large groups created by the first four levels of division.  This was sufficient to create considerable cohesion both in terms of internal return correlation magnification and consistency of industry classifications.  It should be kept in mind that, just as in factor analysis, the exact groupings are subject both to noise
(coincidence in a small sample) and changes through time in the characteristics of the stocks and the market’s emphasis on different attributes in evaluating them. The groups ranged in size from about 30 to about 300 stocks each.  Some seemed real, and some so diffuse across industries that they may have been spurious.  Here are my personal labels for the groups that subjectively seemed most real and important.

Technology: This is a broader group (223 stocks) than
one might expect based on industry labels and popular lore; they all tended to
move together over the last decade. Here are some of the better known stocks

AOL Time Warner Microsoft IBM Merrill Lynch Schwab (Charles)
Disney (Walt) Broadwing Best Buy Staples Advanced Micro Dev.
Intel Cisco Systems Dell Adobe Systems Hewlett Packard
Biogen Oracle Motorola Sybase Eastman Kodak
Storage Technology PeopleSoft Genzyme AT&T Viacom
Cendant Home Depot Sony (ADR) Texas Instruments Qualcomm


Note that Biogen and Genzyme crossed over from the biotech world to share in the same movements with the computer-related and Internet support companies.  We also see media stocks like Disney.  If you are heavy in conventionally-defined technology stocks, don’t bet that media stocks, biotech, or even office equipment retailers will provide adequate diversification.  Note that most Internet stocks could not be present because they were not in the database for an entire decade.

Technology, Research, Consumer Support & Interest-Sensitive Stock Returns

Research: The 113 stocks in this group have in common smaller size and, taken as a whole, more emphasis on recent research than seems present in the Technology group.  Their prices took off later in the development of what we know as the upside of the speculative bubble.  Here is a sample.

Coherent Cell Genesys Fuelcell Energy Bio-Rad Labs Advanced Magnetics
Delphax Technologies Plato Learning Calgon Carbon Lamson & Sessions Chiron
Gilead Sciences        


The chart below shows how these groups soared well above more pedestrian fellows during 1999 and 2000.

Consumer Support: This group (120 stocks) includes many very large consumer-related companies.  Some of the better known names are:

Abbot Labs Heinz Bristol Myers Squibb Merck Johnson & Johnson
Schering Plough Pfizer McDonald’s Amer. Intl. Group Glaxo Smith Kline
Walgreen McCormick & Co Baxter Intl Coors Alberto Culver
Pepsico Avon Products Colgate Palmolive Stryker Procter Gamble
Clorox Gillette Coca Cola Archer Daniel Midlands Gen’l Dynamics
Verizon Bell South Comcast Electronic Data Sys Honeywell
Winn Dixie Safeway Textron Delta Air Lines Southwest Airlines
Ford Motor United Technologies Dow Jones New York Times WalMart Stores


It is interesting that General Motors is not here, but in a more diffuse cluster of stocks (not shown) that includes many ADR’s and companies with global interests. Note also that telephone companies in the larger sense are consumer companies, not technology companies, even though they have shared financial problems with some technology firms in the last couple of years.

Interest-Sensitive: This is a large group (304stocks) that includes many banks, financial service companies, and beneficiaries of low interest for capital intensive projects such as home-building activity, as well as some companies that benefit from economic recession.  Some examples are:

Progressive (Ohio) State Street Vulcan Materials Wells Fargo Kimberly Clark
Sherwin Williams Kellogg Chubb Marsh & McLennan Beverly Enterprises
Union Planters Leggett & Platt Diebold Genuine Parts ManPower Inc
Sears, Roebuck Carnival Corp Legg Mason Mellon Financial McGraw Hill
Mohawk Industries Toll Brothers Chevron Texaco Nike Wendy’s
Bed Bath & Beyond Starbucks AutoZone Stanley Works Waste Management


Technology & Interest-Sensitive Return Differences

Natural Resources: This group (115 stocks) is dominated by petroleum and a few other natural resource stocks.  Here is a short list to give the idea:

Tesoro Petroleum Amerada Hess Kerr-McGee Anadarko Petroleum Apache corp
Helmerich & Payne Offshore Logistics Tidewater Exxon Mobil Domtar
Barrick Gold Placer Dome Mesabi Trust    


As the following chart shows, these stocks did not see the startling runup of the technology stocks, but they have continued to resist the impact of the bear market much better than other groups, reflecting increases in world tensions that have pushed up the price of oil and gold, as well as from higher economic growth rates in Asia that have benefited commodities.

Industrial: This core market group (219 stocks) generally embodies mature companies, including what we might call “smokestack” industrials.  Their defining characteristic is probably mature technology. These stocks are generally more cyclical and mostly lost ground during the last decade as compared to other groups, except for utilities, which did even worse. Here are some of the better known stocks included.

Bandag Polaris Inds Smith (AO) Air Products & Chem Norfolk Southern
Goodyear Tire Whirlpool Deere & Co. Dana Phillips-Van Heusen
Reebok Florida Rock Crown Cork US Steel Alcan
Phelps Dodge Weyerhauser PPG Ind DuPont Boeing
Caterpillar Fluor Olin Oneida National Presto


Many of the companies in this group enjoyed status as growth stocks not many
decades ago — it seems odd to see Boeing, for example, now in the same category
as US Steel.

Utilities: This is a rather distinct cluster of 115 stocks.  Many have tried to branch out into non-regulated endeavors, perhaps accounting for the observation that the group is surprisingly cyclical given yesterday’s reputation for stability.  Here are some representatives.

NICOR Duke Energy Entergy FPL Group Consol. Edison
Puget Energy Energen Oneok Maine Public Services California Water
Laclede Group UST Inc Anheuser Busch Philadelphia Suburban  


Notice how a tobacco stock has crept into the group.  Before the legal wars of the last decade, tobacco stocks were often treated as steady dividend producers like utilities.  The inclusion of Anheuser Busch may be coincidental, or it may tell us something about that business.

Other Groups: There are several hundred stocks still unaccounted for.  Some of the largest are in a somewhat diffuse cluster that contains many foreign ADR’s and stocks that are dominated by international activities.  Another small cluster relates to thrifts, REIT’s and some smaller banks.  Still another small cluster contains remaining property/casualty
insurance companies, some broader financial service companies, and, interestingly, medical supply companies.  Another contains environmental stocks and some recreation stocks.  There are several more very diffuse clusters that seem to encompass smaller stocks of the light industry variety.  The degree to which they represent true cluster structures as opposed to coincidence is more debatable than those presented above.

Extra Credit:  Bottom-Level Pairs

When two stocks are on adjacent twigs deep within a cluster, they are probably truly related.  But if they are also categorized by Value Line as being in the same industry, we can be certain.  In this case, it usually makes no sense to own both stocks.  (However, it might make a lot of sense to be long one and short the other, because most of the background noise has been filtered out.)  Inspection of cluster structure within an industry can also
produce unexpected insights.

For example, what do State Street Corp and Bank of New York share that places them together on a branch relatively far from their conventionally-defined industry brethren?  They are both strong in back office processing and provide services very different from most banks or money management firms.  Here is an example list of some of the pairs that seem to illustrate near substitutes:

  • State Street Corp vs. Bank of New York
  • Merrill Lynch vs. Bear, Stearns
  • Schlumberger vs. Halliburton
  • Best Buy vs. Limited Brands
  • Saks vs. Nordstrom
  • Office Depot vs. Staples
  • Intel vs. Advanced Micro Devices
  • Borland Software vs. PeopleSoft
  • Boise Cascade vs. Weyerhauser
  • Toll Brothers vs. Pulte Homes
  • Pfizer vs. Schering-Plough
  • Comcast vs. Cablevision Sys.