Skip to content

Two Patches for Markowitz Portfolio Optimizers

2014 March 28

Recently I was asked to be a discussant to a presentation for the Boston Security Analysts Society.  The following thoughts are taken from my draft comments.  They assume familiarity with optimizers and their modified versions.

Introduction

Both Dick Michaud and, in their own way, Black and Litterman, attempted to fill in what Markowitz left out.  Markowitz’s premature discarding of relevant information about the range of possible values for his model’s inputs has been a principle concern of Dick.  Markowitz also understandably neglected the human element in the system to be optimized.  It seems to me that better aligning the method with user needs is a significant part of what Black and Litterman tried to do.

Harry  Markowitz

Imagine Harry Markowitz as a graduate student in the Department of Economics at the University of Chicago over 60 years ago — before Eisenhower was elected president.  At that time, representing risk aversion as curvature in utility functions was cutting edge for economists, as was a rudimentary revival of Bayesian probability.  And computers as we now know them did not exist.

At the time he wrote his 1952 paper, Markowitz appears to have had no experience in investing, and not much in the way of computation aids with which to test his approach against reality.   It soon became apparent in practice that his method gave under-diversified portfolios, resulted in too much trading, and tended to underestimate the true risk of the portfolio.  These problems got worse as the number of securities was increased.  They were particularly bad when short positions were allowed, which may have motivated his subsequent work on an algorithm for handling position constraints, published in 1956.

Over the following decades, modifications were added by many people to make the method more usable.  You probably know the details better than I.

Premature Discarding of Information

By using only point estimates of both means and covariances, Markowitz mean-variance optimization discards much of our knowledge about their possibilities..

He asks what security weights in a portfolio give us the maximum difference between the benefit of expected portfolio return and the penalty of expected risk.  If we used our full knowledge to derive a joint probability distribution for the various security returns, we could in principle truly optimize security weights for a given risk aversion.

But Markowtiz took a shortcut.  His method assumes that he can reach the right answer based on point estimates for security return means and covariances.  This sacrifices a great deal more than just leaving out return skewness and kurtosis to achieve mathematical simplicity.

A simple example will help make the difficulty clear.  Suppose your knowledge of an input variable x tells you it will take on the value of either 0 or 8, with equal probabilities.  Suppose a consequent output takes any x and converts it to  2x+3.  Since we know the mean of x is 4, we can accurately guess the mean of 2x+3 as 11, without having to construct a probability distribution for the 2x+3 outputs before taking its mean.    In this linear case, the sequence of taking the mean and doing the input-output transformation is unimportant.

Now change the transformation process from 2x+3 to, instead, x-squared.  Transforming the mean of x by squaring gives 16.  But if we had first constructed the probability distribution for the output x-squared, we would find that its mean is not 16, but, rather, 32.  In this case, sequence is important.  Our original estimate was both biased and had a high standard error.  The same phenomenon also occurs if we start with the variance, rather than the mean, and run it through the same transformation, rather than taking the variance of the transformed variable.

This problem quite generally occurs when we prematurely condense probability distributions to point estimates before running that estimate through a nonlinear transformation.  It happens for the calculation 1/x, for x times y, for x/y, and if y is a covariance matrix, for x times the inverse of y.

If Markowitz optimizes a single stock versus cash, his unconstrained estimate of the best stock allocation weight is the excess mean return divided by the product of risk aversion times the variance.  Here he employs a nonlinear function of variables: x/y.  Not only will corresponding realized means and variances be dispersed around these point estimates, but distributions of realized means and variances may also be intercorrelated.  (Have you ever noticed that market declines are accompanied by higher volatility?)   Both contribute to error.

If you work with several stocks, the Markowitz optimum weight estimates are in proportion to the product of  x, a vector of means, and the inverse of  y, the covariance matrix.  If the incorporated stock returns are correlated, this transformation of their covariance matrix to its inverse involves a snakes-nest of additional non-linearities.

You may have run into an analogous problem trying to use multiple linear regression with correlated independent variables to predict a dependent variable.  The resulting regression coefficients alternate in sign, and though they may look significant, they result in poor out of sample results, and are more subject to change given small changes in inputs.  The closer the correlations among groups of independent variables, and the larger the number of variables included, the worse these problems get.  You can improve the situation by dropping out some variables, (analogous to constraining short sales to zero) but some information is lost as compared to a better procedure.

We know that various modifications can reduce the error-increasing effect of the matrix inversion process.  But no amount of position limits and robust point estimates of means and covariances can fully cure the problem — because we have lost needed information.

Broadening the Problem Definition

Steve Jobs broadened the definition of building a good music player to include the system composed of both the machine and the human user.  What had been complex and difficult to understand, remember, and control became simple to use.  The market for portable music players exploded, with the iPod the early winner.  Maybe we can learn something about financial engineering design by considering the combination of an algorithm and the real people who are to use it.

Mean-variance optimization, as Markowitz first imagined it, is not designed for use by most human beings.  It is buggy and complicated and difficult to explain.  It works well on some problems but not on others — and the boundaries between them are murky to most users.

Consider the automobile most of us drive.  We may not understand all the details of how our car works, but as long as we can anticipate a simple response to the steering wheel, to the accelerator and to the brakes, we feel in control and driving is extremely popular.  Markowitz needed to give this kind of control to, at least, professional investors, most of whom are not expert in higher mathematics, but may have valuable investing insights.

So much for the problems that both the Black-Litterman and the Michaud resampling approaches are trying to solve.

Black-Litterman

There are several different versions of the Black-Litterman approach.  However, we can assess the broad family characteristics without too many details.

These algorithms pay no attention to the loss of information caused by using point estimates for variances and covariances. Since passive expected returns in this method are based on reverse optimization from a market index portfolio’s weights and point estimates of covariances, they also consequently pay insufficient attention to using point estimates for means for the reference portfolio.  The use of a capitalization-weighted index for the reference portfolio is arguably not the best choice.  Finally, de-emphasizing the role of the risk aversion parameter is also a bad idea if one wants to serve specific investors.  In all of these, I believe Dick Michaud’s critique of Black-Litterman is very well taken.

But the approach is not all bad.  Looking through reverse optimization and back again, in essence the Black Litterman approach mixes the portfolio weights from a reference portfolio with those of an overlay long-short portfolio based on the investor’s “views.”  To the extent the reference portfolio is itself very well diversified and long only, the resulting combined portfolio, assuming views that are modest, can improve on Markowitz mean-variance optimization.  This benefit should be recognized, although that may be weak praise from the viewpoint of a more rigorous analysis.

The Black-Litterman approach’s strongest point seems to me that it makes Markowitz portfolio optimization more user  friendly.  It tends to reduce the wildest overconfident estimates on mean return differences and it sharply reduces the number of input parameters that investors feel obligated to independently estimate.  Because the focus is on only those securities about which there are overlay views, there is likely to be a higher proportion of trading that is done for understandable reasons.

In addition, by creating a framework for mixing data-driven with more subjective information, it may make it easier to accept combining qualitative with quantitative insights.  I believe this synergistic style is advantageous for active investors.  Explicitly recognizing subjective views makes them much more subject to review and improvement than using subjectively-imposed position constraints.  However, we should not go overboard in ascribing the qualities of Bayesian probability approaches to Black-Litterman.  The use of a Bayesian formula for mixing normal distributions of known variance to get a combined estimate for the means is far from a complete Bayesian treatment.

Like the iPod, though it may be internally complex, externally Black-Litterman may feel more controllable, more aligned with business needs, and therefore in a sense simpler to use than conventional mean-variance optimization.

Michaud Resampling as an Alternative

The resampling approach takes an average of classic Markowitz optimizations, including position constraints if desired, over a random sample of possible estimates for means and covariances.   It therefore directly attacks the problem of premature loss of information.  It very clearly increases diversification and stability of allocation weights.

On the other hand, while it brings more information on input parameter distributions through to be optimized, it still optimizes before all available information has been taken into account. It is easy to see that optimizing based on partial samples of returns implicit in resampled means and variances can offer only an approximation.

Consider the simplest case of one stock versus cash.  The stock weight depends on the ratio of excess mean to variance.  If we perturb these inputs, we can get a variety of ratios and then take their average.  This will not give the same pooled result as the ratio of the average mean to an estimate of variance based on the average variance plus the variance of the means. But in many practical cases the approximation will be close.

There are a number of papers comparing resampling to conventional mean-variance optimization, ranging from fans to pans.  I believe papers criticizing Dick’s approach as being ad hoc are beside the point.  The issue is how you get the best answers for practical use.  It is clear that resampling does not win in all cases – but it seems to me that it is much less likely than more popular approaches to get you into trouble.

We need more guidance as to when resampling is most likely to be an improvement.  We may need easier user access to New Frontiers best practice.  But note that Dick’s approach is a more modular foundation building block than is Black-Litterman.  It is easy to append  robust covariance point estimates, such as the Ledoit method, for example.   There is also nothing in resampling to exclude combining active subjective prior probability distributions with data-driven distributions, and to do so with more capable modern Bayesian methods.