Laplace Approximation With Different Distributions: Python Examples With Plots

15–22 minutes

Laplace approximation is a way to approximate a difficult posterior distribution using a Gaussian distribution near its peak.

The posterior is written as:

(p(θ|D)p(D|θ)p(θ))(p(\theta \mid D) \propto p(D \mid \theta)p(\theta))

Taking logs:

(h(θ)=logp(D|θ)+logp(θ))(h(\theta)=\log p(D \mid \theta)+\log p(\theta))

So:

(p(θ|D)exp(h(θ)))(p(\theta \mid D) \propto \exp(h(\theta)))

Near the posterior mode (\hat{\theta}), we use a second-order Taylor approximation:

h(θ)h(θ^)+12h(θ^)(θθ^)2h(\theta) \approx h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2

At the mode:

h(θ^)=0h'(\hat{\theta})=0

That is why there is no first-derivative term.

Since (\hat{\theta}) is a maximum:

h(θ^)<0h”(\hat{\theta})<0

So the posterior becomes approximately:

p(θ|D)exp(h(θ^)+12h(θ^)(θθ^)2)p(\theta \mid D) \approx \exp\left(h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2\right)

Ignoring constants:

p(θ|D)exp(12h(θ^)(θθ^)2)p(\theta \mid D) \approx \exp\left(\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2\right)

Let:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

Then:

p(θ|D)exp((θθ^)22V)p(\theta \mid D) \approx \exp\left(-\frac{(\theta-\hat{\theta})^2}{2V}\right)

Therefore:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

Example 1: Bernoulli Data With Logistic Parameter

Model

Suppose we observe binary outcomes.

For example, in healthcare:

(yi=1)(y_i=1)

could mean a patient responded to treatment.

(yi=0)(y_i=0)

could mean the patient did not respond.

The model is:

yiBernoulli(p)y_i \sim Bernoulli(p)

Instead of estimating (p) directly, we use the logit parameter:

p=11+exp(β)p=\frac{1}{1+\exp(-\beta)}

So:

β=log(p1p)\beta=\log\left(\frac{p}{1-p}\right)

Assume the prior:

βN(0,10)\beta \sim N(0,10)

The likelihood is:

p(D|β)=i=1npyi(1p)1yip(D \mid \beta)=\prod_{i=1}^{n}p^{y_i}(1-p)^{1-y_i}

The log-likelihood is:

logp(D|β)=i=1n[yilog(p)+(1yi)log(1p)]\log p(D \mid \beta)=\sum_{i=1}^{n}\left[y_i\log(p)+(1-y_i)\log(1-p)\right]

The log-prior is:

logp(β)=logN(β;0,10)\log p(\beta)=\log N(\beta;0,10)

So the log-posterior is:

h(β)=i=1n[yilog(p)+(1yi)log(1p)]+logN(β;0,10)h(\beta)=\sum_{i=1}^{n}\left[y_i\log(p)+(1-y_i)\log(1-p)\right]+\log N(\beta;0,10)

Laplace approximation gives:

β|DN(β^,V)\beta \mid D \approx N(\hat{\beta},V)

where:

V=1h(β^)V=-\frac{1}{h”(\hat{\beta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([1,1,1,0,1,0,1,1,0,1])

def log_posterior(beta):
    beta = beta[0] if isinstance(beta, np.ndarray) else beta
    
    p = 1 / (1 + np.exp(-beta))
    
    loglik = np.sum(
        y * np.log(p) +
        (1-y) * np.log(1-p)
    )
    
    logprior = norm.logpdf(beta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda b: -log_posterior(b), x0=np.array([0.0]))

beta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(beta_hat + eps)
    - 2 * log_posterior(beta_hat)
    + log_posterior(beta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode beta_hat:", beta_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(beta_hat - 4*std, beta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(b) for b in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=beta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(beta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("beta")
plt.ylabel("Density")
plt.title("Laplace Approximation for Bernoulli Logistic Model")
plt.legend()
plt.show()

What the Plot Shows

The solid curve is the normalized posterior.

The dashed curve is the Gaussian approximation:

β|DN(β^,V)\beta \mid D \approx N(\hat{\beta},V)

If the two curves are close, the Laplace approximation is good.


Example 2: Poisson Data With Log Rate

Model

Suppose we observe count data.

For example:

yiy_i

could be the number of emergency department arrivals per hour.

The model is:

yiPoisson(λ)y_i \sim Poisson(\lambda)

Since:

λ>0\lambda>0

we write:

λ=exp(θ)\lambda=\exp(\theta)

So:

θ=log(λ)\theta=\log(\lambda)

Assume the prior:

θN(0,10)\theta \sim N(0,10)

The likelihood is:

p(D|θ)=i=1nλyiexp(λ)yi!p(D \mid \theta)=\prod_{i=1}^{n}\frac{\lambda^{y_i}\exp(-\lambda)}{y_i!}

Since:

λ=exp(θ)\lambda=\exp(\theta)

the log-likelihood is:

logp(D|θ)=i=1n[yiθexp(θ)log(yi!)]\log p(D \mid \theta)=\sum_{i=1}^{n}\left[y_i\theta-\exp(\theta)-\log(y_i!)\right]

The log-posterior is:

h(θ)=i=1n[yiθexp(θ)log(yi!)]+logN(θ;0,10)h(\theta)=\sum_{i=1}^{n}\left[y_i\theta-\exp(\theta)-\log(y_i!)\right]+\log N(\theta;0,10)

Laplace approximation gives:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

where:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

Then:

(\lambda \approx \exp(\hat{\theta}))

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, poisson

y = np.array([3,5,4,7,6,5])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(poisson.logpmf(y, mu=lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Poisson Model")
plt.legend()
plt.show()

What the Plot Shows

The posterior is for:

(θ=log(λ))(\theta=\log(\lambda))

not directly for (\lambda).

The Gaussian approximation is:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

Example 3: Binomial Data With Logistic Parameter

Model

Suppose a treatment is given to n=20 patients.

Out of them: y=15 respond.

The model is:

yBinomial(n,p)y \sim Binomial(n,p)

Use:

p=11+exp(θ)p=\frac{1}{1+\exp(-\theta)}

Prior:

θN(0,10)\theta \sim N(0,10)

The likelihood is:

p(y|θ)=(ny)py(1p)nyp(y \mid \theta)=\binom{n}{y}p^y(1-p)^{n-y}

The log-likelihood is:

logp(y|θ)=log(ny)+ylog(p)+(ny)log(1p)\log p(y \mid \theta)=\log\binom{n}{y}+y\log(p)+(n-y)\log(1-p)

The log-posterior is:

h(θ)=log(ny)+ylog(p)+(ny)log(1p)+logN(θ;0,10)h(\theta)=\log\binom{n}{y}+y\log(p)+(n-y)\log(1-p)+\log N(\theta;0,10)

Laplace approximation gives:

(θ|DN(θ^,V))(\theta \mid D \approx N(\hat{\theta},V))

where:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, binom

y = 15
n = 20

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    p = 1 / (1 + np.exp(-theta))
    
    loglik = binom.logpmf(y, n=n, p=p)
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

p_hat = 1 / (1 + np.exp(-theta_hat))

print("Posterior mode theta_hat:", theta_hat)
print("Approximate p:", p_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = logit(p)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Binomial Model")
plt.legend()
plt.show()

What the Plot Shows

The approximation is Gaussian on the logit scale:

θ=log(p1p)\theta=\log\left(\frac{p}{1-p}\right)

So:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

Then convert back:

p=11+exp(θ)p=\frac{1}{1+\exp(-\theta)}

Example 4: Exponential Data With Log Rate

Model

Suppose we observe waiting times.

For example:

yiy_i

could be the time until hospital discharge.

The model is:

yiExponential(λ)y_i \sim Exponential(\lambda)

The exponential density is:

p(yi|λ)=λexp(λyi)p(y_i \mid \lambda)=\lambda \exp(-\lambda y_i)

Since:

(λ>0)(\lambda>0)

we use:

λ=exp(θ))\lambda=\exp(\theta))

Prior:

θN(0,10))\theta \sim N(0,10))

The likelihood is:

p(D|θ)=i=1nλexp(λyi)p(D \mid \theta)=\prod_{i=1}^{n}\lambda \exp(-\lambda y_i)

The log-likelihood is:

logp(D|θ)=i=1n[θexp(θ)yi]\log p(D \mid \theta)=\sum_{i=1}^{n}\left[\theta-\exp(\theta)y_i\right]

The log-posterior is:

h(θ)=i=1n[θexp(θ)yi]+logN(θ;0,10)h(\theta)=\sum_{i=1}^{n}\left[\theta-\exp(\theta)y_i\right]+\log N(\theta;0,10)

Laplace approximation gives:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

where:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, expon

y = np.array([2.1, 1.7, 3.4, 2.8, 1.9, 2.5, 3.0])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(expon.logpdf(y, scale=1/lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Exponential Model")
plt.legend()
plt.show()

What the Plot Shows

The solid line is the posterior for:

θ=log(λ)\theta=\log(\lambda)

The dashed line is:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

Example 5: Normal Data With Unknown Mean

Model

Suppose we observe blood pressure measurements.

Let:

yiN(μ,σ2)y_i \sim N(\mu,\sigma^2)

Assume (σ\sigma) is known.

Prior:

μN(0,100)\mu \sim N(0,100)

The likelihood is:

p(D|μ)=i=1n12πσ2exp((yiμ)22σ2)p(D \mid \mu)=\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i-\mu)^2}{2\sigma^2}\right)

The log-likelihood is:

logp(D|μ)=i=1n[12log(2πσ2)(yiμ)22σ2]\log p(D \mid \mu)=\sum_{i=1}^{n}\left[-\frac{1}{2}\log(2\pi\sigma^2)-\frac{(y_i-\mu)^2}{2\sigma^2}\right]

The log-posterior is:

h(μ)=logp(D|μ)+logN(μ;0,100)h(\mu)=\log p(D \mid \mu)+\log N(\mu;0,100)

Laplace approximation gives:

μ|DN(μ^,V)\mu \mid D \approx N(\hat{\mu},V)

where:

V=1h(μ^)V=-\frac{1}{h”(\hat{\mu})}

In this case, because the likelihood and prior are both Gaussian, the posterior is already Gaussian.

So Laplace approximation is essentially exact.

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([128, 132, 125, 130, 135, 127, 131])
sigma = 5

def log_posterior(mu):
    mu = mu[0] if isinstance(mu, np.ndarray) else mu
    
    loglik = np.sum(norm.logpdf(y, loc=mu, scale=sigma))
    
    logprior = norm.logpdf(mu, loc=0, scale=10)
    
    return loglik + logprior

result = minimize(lambda m: -log_posterior(m), x0=np.array([120.0]))

mu_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(mu_hat + eps)
    - 2 * log_posterior(mu_hat)
    + log_posterior(mu_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode mu_hat:", mu_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(mu_hat - 4*std, mu_hat + 4*std, 500)

log_post_values = np.array([log_posterior(m) for m in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=mu_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(mu_hat, linestyle=":", label="Posterior mode")
plt.xlabel("mu")
plt.ylabel("Density")
plt.title("Laplace Approximation for Normal Mean Model")
plt.legend()
plt.show()

What the Plot Shows

The posterior and Laplace approximation should almost overlap.

That is because the log-posterior is exactly quadratic in (\mu).

A quadratic log-posterior gives an exact Gaussian posterior.


Example 6: Gamma Data With Log Shape Parameter

Model

Suppose we observe positive continuous data.

For example:

yiy_i

could be hospital length of stay.

Assume:

yiGamma(α,β)y_i \sim Gamma(\alpha,\beta)

where (α\alpha) is the shape and (\beta) is the rate.

Assume (β\beta) is known.

Since:

(α>0)(\alpha>0)

we write:

α=exp(θ)\alpha=\exp(\theta)

Prior:

(θN(0,10))(\theta \sim N(0,10))

The gamma likelihood is:

p(yi|α,β)=βαΓ(α)yiα1exp(βyi)p(y_i \mid \alpha,\beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}y_i^{\alpha-1}\exp(-\beta y_i)

The likelihood is:

p(D|θ)=i=1nβαΓ(α)yiα1exp(βyi)p(D \mid \theta)=\prod_{i=1}^{n}\frac{\beta^\alpha}{\Gamma(\alpha)}y_i^{\alpha-1}\exp(-\beta y_i)

where:

α=exp(θ)\alpha=\exp(\theta)

The log-likelihood is:

logp(D|θ)=i=1n[αlog(β)logΓ(α)+(α1)log(yi)βyi]\log p(D \mid \theta)=\sum_{i=1}^{n}\left[\alpha\log(\beta)-\log\Gamma(\alpha)+(\alpha-1)\log(y_i)-\beta y_i\right]

The log-posterior is:

h(θ)=logp(D|θ)+logN(θ;0,10)h(\theta)=\log p(D \mid \theta)+\log N(\theta;0,10)

Laplace approximation gives:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

where:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

Then:

αexp(θ^)\alpha \approx \exp(\hat{\theta})

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, gamma

y = np.array([2.4, 3.1, 2.8, 4.0, 3.6, 2.9, 3.3])

beta = 1.0

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    alpha = np.exp(theta)
    
    loglik = np.sum(gamma.logpdf(y, a=alpha, scale=1/beta))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

alpha_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate alpha:", alpha_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(alpha)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Gamma Shape Model")
plt.legend()
plt.show()

What the Plot Shows

The approximation is done on:

θ=log(α)\theta=\log(\alpha)

not directly on (\alpha).

So the Gaussian approximation is:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

Then transform back:

α=exp(θ)\alpha=\exp(\theta)

Complete Reusable Function for Laplace Approximation

To avoid rewriting the same code again and again, we can create a reusable function.

import numpy as np
from scipy.optimize import minimize

def laplace_approximation(log_posterior, start_value):
    result = minimize(
        lambda x: -log_posterior(x),
        x0=np.array([start_value])
    )
    
    theta_hat = result.x[0]
    
    eps = 1e-5
    
    hessian = (
        log_posterior(theta_hat + eps)
        - 2 * log_posterior(theta_hat)
        + log_posterior(theta_hat - eps)
    ) / eps**2
    
    variance = -1 / hessian
    std = np.sqrt(variance)
    
    return theta_hat, variance, std

And a reusable plotting function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_laplace(log_posterior, theta_hat, std, x_label, title):
    grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)
    
    log_post_values = np.array([log_posterior(t) for t in grid])
    
    posterior_values = np.exp(log_post_values - np.max(log_post_values))
    posterior_values = posterior_values / np.trapz(posterior_values, grid)
    
    laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)
    
    plt.figure(figsize=(8,5))
    plt.plot(grid, posterior_values, label="Posterior")
    plt.plot(grid, laplace_values, "--", label="Laplace approximation")
    plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
    plt.xlabel(x_label)
    plt.ylabel("Density")
    plt.title(title)
    plt.legend()
    plt.show()

Example use:

theta_hat, variance, std = laplace_approximation(log_posterior, start_value=0)

plot_laplace(
    log_posterior=log_posterior,
    theta_hat=theta_hat,
    std=std,
    x_label="theta",
    title="Laplace Approximation"
)

Final Summary

Laplace approximation follows this pattern:

p(θ|D)exp(h(θ))p(\theta \mid D) \propto \exp(h(\theta))

Approximate the log-posterior near the mode:

h(θ)h(θ^)+12h(θ^)(θθ^)2h(\theta) \approx h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2

Then:

p(θ|D)exp((θθ^)22V)p(\theta \mid D) \approx \exp\left(-\frac{(\theta-\hat{\theta})^2}{2V}\right)

where:

V=1h(θ^)V=-\frac{1}{h”(\hat{\theta})}

So:

θ|DN(θ^,V)\theta \mid D \approx N(\hat{\theta},V)

The important idea is:

The log-posterior becomes approximately quadratic.

The posterior becomes approximately Gaussian.

That is why Laplace approximation works.

Leave a Reply

Discover more from Nerdish.Org

Subscribe now to keep reading and get access to the full archive.

Continue reading