I wish I had learned Bayesian probability earlier in my experience as an investor.  It has helped me understand why quantitative methods ordinarily used tend to underestimate risk, and also explained much better the problem of why we tend to over value performance information, and how to correct it to really get at skill.  The following is the substance of a PowerPoint presentation I gave in 2006 to a local Boston quantitative investing interest group — QWAFAFEW.  Despite, or perhaps because of, its tongue in cheek name, this group has been the source of valuable exchange of ideas in the Boston investment community.

## Bayesian & Qualitative Approaches to Quantitative Investing

Jarrod Wilcox
QWAFAFEW
December 12, 2006

### BAYESIAN INFERENCE FOR QUANTITATIVE INVESTORS

#### In the super-competitive investment arena:

• Changing environments limit relevant data.
• Scientific consensus is not necessarily rewarding.

#### We can still benefit from:

• Scientific reasoning
• Reduced impact of emotion & cognitive error
• Optimal learning from data
• Private Bayesian priors.

### FAMILIAR INVESTMENT APPLICATIONS

• Black-Litterman: shrinks excess return point
estimates toward CAPM-based prior.

#### Bayesian portfolio risk estimates:

• LeDoit & Wolf: shrinks covariance point
estimates toward empirically-based priors.
• Michaud: incorporates uncertainty around covariance point estimates.

### TIMELINE

#### Dawn of probabilistic reasoning:

• Bernoulli, Bayes, Laplace (1700’s to 1800’s), games of chance –
probability as odds–

#### Classical statistics and probability:

• Fisher, Neyman, Pearson etc. (early 1900’s), probability as
frequency.
• Extensions to multi-step processes, example Feller (1900’s)
• Kolmogorov, rigorous axioms.

#### Bayesian rebellion against frequentists:

• Polya, Cox, Jeffreys, Jaynes, “Savage, Raiffa & Schlaifer,” decisions where data is limited (mid 1900’s)

#### The new Bayesians:

• “Empirical Bayes”, hierarchical estimation (example: Stein-James shrinkage), distributed predictions (mid-late 1900’s)
• Iterative techniques: Markov Chain Monte Carlo simulation (1990’s to present).

### PROBABILITY: THE LOGIC OF SCIENCE

#### Probability axioms about A (assertions) and D (data):

• A iff P(A)=1, (not A) iff P(A)=0
• P(A1 or A2) = P(A1) + P(A2) – P(A1 and A2)
• P(A1 and A2) = P(A1) * P(A2 | A1)

#### From which Bayes Rule logically follows:

• P(A | D) = P(A) * P(D | A) / P(D)
• Posterior probability = prior * likelihood / normalization
• Normalization constant P(D) is the sum of P(D| A i) over all
mutually exclusive Ai.

### SIMPLE DISCRETE EXAMPLE

#### Situation:

• Two fair dice cubes are rolled, and the sum of the their
faces is 8. What is the probability that a 2 and a 6 are present?

#### P(A|D) = P(A) * P(D|A) / P(D)

• P(A) = 1/36 + 1/36 P(D|A) = 1
• P(D) = 2/36+2/36+1/36

### MAKING A QUALITATIVE DECISION MORE QUANTITATIVE

Should I invest in Canadian farmland?
• Higher future use of foodstock for energy, China food
consumption
• Global warming impact?

#### Can probability of a big payoff be posed?

• Probability of being right when the market is wrong given record of my past similar cosmic predictions.
• Confidence that in this case I am right and market is wrong.

### SINGLE-PARAMETER CONTINUOUS DISTRIBUTION

Binomial-generated probability Θ of cumulative “wins” W versus “losses” L:
• Use the beta distribution with α=W+1, β=L+1. • Convenient conjugate property: posterior has same form as prior. The prior can be interpreted as additional win-loss data.
• Beta(α1+ α2, β1+ β2) = Beta(α1, β1 )*Beta(α2, β2 )
• Beta(1,1) is an uninformative prior.

### EXAMPLE

What is the probability of value strategies outperforming growth strategies next month?
• Data: monthly returns for the S&P 500 value and growth sub-indices as maintained by BARRA from Jan 75 through Dec 03.
• Assumption: An uninformed prior at the beginning, no knowledge of structure such as autocorrelation.

### WHAT IF MONTHS WERE EXCHANGEABLE? ARE VALUE MONTHS MORE FREQUENT? ### WHEN PRIORS MEET LIKELIHOOD The mean of the posterior density is a weighted average of the prior and likelihood
densities.
The weights are proportional to the relative precision of the two estimates.
Note that in this single-parameter model, more data always leads to less dispersion of probability.

### CONJUGATE BETA LEARNING MODEL

#### Assumes

• Minimal knowledge of IID process
• No scale parameter, location uncertainty is always reduced by data.

#### Potential applications

• Semi-qualitative decisions. Will the Fed raise rates? Will the correlation between stock and bond returns be positive?
• “Non-parametric” identification of potential return forecasting signals. News coding.
• Working backward to discover implicit priors (possible because of unique solution). Are you crazy?

### MULTI-PARAMETER DENSITY

#### Factor the conditional probabilities.

• Example: P(μ,σ | D) = P(μ | σ,D) * P(σ | D)
• Where P(σ | D) = ∫ P(σ | μ,D) dμ

#### Some simple cases have been worked out in closed form.

• Example: Normally-distributed process with unknown mean and variance.

### PERFORMANCE PROJECTION

#### By how much should a large-cap value manager beat the S&P500?

• We want to know both location and scale.
• Assume IID log normal process Jan 75 – Dec 03.
• Excess kurtosis, predictable variation ignored.
• For convenience, we will use a conjugate prior:
• σ2 distributed as scaled inverse chi-squared
• Mean distributed as N(μ, σ2/n ).
• Priors for mean (0%), standard deviation (2%) and
inverse chi-squared degrees of freedom (12) for σ2.

### UPDATING THE DISPERSION ### UPDATING THE MEAN ### TO PREDICT NEW DATA

#### Point estimate:

• Dpredicted = mean of μ distribution

#### Full Bayesian estimate:

• Distribution of Dpred ~ ∫∫P(Dpred| μ,σ,D)P(μ,σ | D) dμdσ
• Here, repeat sequential draws of σ, μ|σ and N(μ,σ) until a forecast distribution is formed.

### CONJUGATE NORMAL INVERSE CHI-SQUARED LEARNING MODEL

#### Very widely applicable

• If process is close to IID normal or log-normal
• Any univariate data IID case where Central Limit Theorem kicks in.

#### Applications

• Decisions where scale of dispersion is important: Ranking of
active managers, Extension — Black-Litterman portfolio optimization
• Where learning can be speeded up by the addition of priors to evidence: Extension — Bayesian regression
• Signals when estimate dispersion increases with increasing data: Evaluation of outliers questioning active strategies.
• Sequential decision-making: How many observations are needed to support a model?

### HIERARCHICAL ESTIMATION

#### Assemble a hierarchy of estimates to better combine group and individual information.

• Basic idea underlying Stein-James estimates and Ledoit-Wolf approach to better covariance estimation.

### FORM EXCHANGEABLE GROUP

#### Morningstar screen:

• Style: large cap, mixed (neither strong value nor strong growth), not international, beta between 0.8 and 1.2.
• Stock pickers: number of holdings between 100 and 250.
• Data availability: Morningstar ratings and 8 years of Yahoo Finance monthly return history.
• Independence: First fund listed in fund family satisfying screen.

### DATA AND MODEL

#### Data:

• Monthly excess returns net of equal -weighted group average of 14 funds, 96 months ending Nov 2006.

#### Model:

• Fundj sample mean mj excess log returns ~ N(Θj , σj2/96) [Central Limit Theorem]
• Θj ~ N(µ , τ2)
• “Empirical Bayes.” σj2 are estimated directly from the data, then treated as knowns.

### ESTIMATION PROCESS

#### For each value of τ in a wide grid:

• Calculate P(τ | m1,m2…,σ12, σ22,….)

#### Draw 10,000 samples of τ from this distribution and for each:

• Draw grand mean µ ~ N(f(τ,m1,m2…,σ12, σ22,…),g(τ,σ12, σ22,….))
• For each fund
• Calculate Bayesian Θmean and then Θvariance
• Draw possible Θj from N(Θmean j, Θvariance j)

### SHRINKAGE OF NAÏVE ESTIMATES These funds were all middle of the road large cap stock pickers.
Is this picture more realistic than unadjusted individual fund records?
What would happen if we included more funds?
What would be a next step?

### POTENTIAL HIERARCHICAL APPLICATIONS

#### Multivariate Extensions:

• Estimating sector and individual stock characteristics as sources of alpha.
• LeDoit and Wolf covariance shrinkage estimation.

### HIERARCHICAL MODEL PERSPECTIVE

#### The 96 observation example gave us an excuse to

regard the individual variances as known.

### RECOMMENDED TEXTS

#### Bayesian Data Analysis, 2nd Edition

• Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B Rubin

#### Risk and Asset Allocation

• Attilio Meucci

• Edwin Jaynes