## Friday, November 2, 2012

### Simple Bayesian bootstrap

Bootstrapping is a very popular statistical technique. However, its Bayesian analogue proposed by Rubin (1981) is not very common. I was looking for an example of its implementation in GNU R and could not find one so I decided to write a snippet presenting it.

In standard bootstrapping observations are sampled with replacement. This implies that observation weights follow multinomial distribution. In Bayesian bootstrap multinomial distribution is replaced by Dirichlet distribution.

This observation leads to very simple implementation of Bayesian bootstrap using gtools package. Here is the code presenting it with a simple application giving frequentist and Bayesian 95% confidence interval for mean in fbq and bbq variables:

library(gtools)

# Bayesian bootstrap
mean.bb <- function(x, n) {
apply(rdirichlet(n, rep(1, length(x))), 1, weighted.mean, x = x)
}

# standard bootstrap
mean.fb <- function(x, n) {
replicate(n, mean(sample(x, length(x), TRUE)))
}

set.seed(1)
reps <- 100000
x <- cars\$dist
system.time(fbq <- quantile((mean.fb(x, reps)), c(0.025, 0.075)))
system.time(bbq <- quantile((mean.bb(x, reps)), c(0.025, 0.075)))

As it can be seen implementation of Bayesian bootstrap is fairly simple.

On my computer Bayesian bootstrap is approximately 80% slower than standard bootstrap, but its performance probably could be improved.

#### 3 comments:

1. Would there be a difference between using your random deviates taken from the Dirichlet distribution as probabilities in taking samples from x instead of computing the mean of the random deviates with x as weights as you are showing here?

2. Yes - there is a big difference. The method with taking them as probabilities in taking samples from x leads to much higher dispersion in results and is incorrect.

You can see it for example by drawing density plots of the bootstrap distribution calculated in both ways.

3. Yes, I can see there's a large difference, thanks.