Laplace Approximation:
Laplace Approximation:
Fit a Gaussian to p(w|D)
log p(w|D) = log p(w,D) +const wrt. w
Quadratic in w if Gaussian
Find mode and fund 2nd derivative
“Energy” E(w) = -log( p(w,D) )
w* = argmin E(w) (L2 regularization term or MAP fit)
Hessian Hij (| w*)
p(w|D) ~= N(w; w*, H^-1)
Approximate p(D|M)
log p(w|D) = log p(w,D) - log p(D) ~= N(w; w*, H^-1)
= |H|^1/2 / (2pi)^b/2 exp(-1/2(w-w*)^T H(w-w^))
Evaluate the approximation at w=w*
p(w*,D) / p(D) ~= |H|^1/2 / (2pi)^b/2
p(D) refer to training data
b (actually it’s D) refer number of parameters
p(D) = p(w*,D) / p(w*| D) ~= p(w*,D) / |H|^1/2 * (2pi)^b/2
We can approximate p(D) for different models and choose the model with the highest marginal likelihood
Could go wrong if the approximation with the Gaussian is a poor fit!