Laplace approximation is a way to approximate a difficult posterior distribution using a Gaussian distribution near its peak.

The posterior is written as:

(p(\theta \mid D) \propto p(D \mid \theta)p(\theta))

Taking logs:

(h(\theta)=\log p(D \mid \theta)+\log p(\theta))

So:

(p(\theta \mid D) \propto \exp(h(\theta)))

Near the posterior mode (\hat{\theta}), we use a second-order Taylor approximation:

h(\theta) \approx h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2

At the mode:

h'(\hat{\theta})=0

That is why there is no first-derivative term.

Since (\hat{\theta}) is a maximum:

h”(\hat{\theta})<0

So the posterior becomes approximately:

p(\theta \mid D) \approx \exp\left(h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2\right)

Ignoring constants:

p(\theta \mid D) \approx \exp\left(\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2\right)

Let:

V=-\frac{1}{h”(\hat{\theta})}

Then:

p(\theta \mid D) \approx \exp\left(-\frac{(\theta-\hat{\theta})^2}{2V}\right)

Therefore:

\theta \mid D \approx N(\hat{\theta},V)

Example 1: Bernoulli Data With Logistic Parameter

Model

Suppose we observe binary outcomes.

For example, in healthcare:

(y_i=1)

could mean a patient responded to treatment.

(y_i=0)

could mean the patient did not respond.

The model is:

y_i \sim Bernoulli(p)

Instead of estimating (p) directly, we use the logit parameter:

p=\frac{1}{1+\exp(-\beta)}

So:

\beta=\log\left(\frac{p}{1-p}\right)

Assume the prior:

\beta \sim N(0,10)

The likelihood is:

p(D \mid \beta)=\prod_{i=1}^{n}p^{y_i}(1-p)^{1-y_i}

The log-likelihood is:

\log p(D \mid \beta)=\sum_{i=1}^{n}\left[y_i\log(p)+(1-y_i)\log(1-p)\right]

The log-prior is:

\log p(\beta)=\log N(\beta;0,10)

So the log-posterior is:

h(\beta)=\sum_{i=1}^{n}\left[y_i\log(p)+(1-y_i)\log(1-p)\right]+\log N(\beta;0,10)

Laplace approximation gives:

\beta \mid D \approx N(\hat{\beta},V)

where:

V=-\frac{1}{h”(\hat{\beta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([1,1,1,0,1,0,1,1,0,1])

def log_posterior(beta):
    beta = beta[0] if isinstance(beta, np.ndarray) else beta
    
    p = 1 / (1 + np.exp(-beta))
    
    loglik = np.sum(
        y * np.log(p) +
        (1-y) * np.log(1-p)
    )
    
    logprior = norm.logpdf(beta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda b: -log_posterior(b), x0=np.array([0.0]))

beta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(beta_hat + eps)
    - 2 * log_posterior(beta_hat)
    + log_posterior(beta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode beta_hat:", beta_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(beta_hat - 4*std, beta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(b) for b in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=beta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(beta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("beta")
plt.ylabel("Density")
plt.title("Laplace Approximation for Bernoulli Logistic Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([1,1,1,0,1,0,1,1,0,1])

def log_posterior(beta):
    beta = beta[0] if isinstance(beta, np.ndarray) else beta
    
    p = 1 / (1 + np.exp(-beta))
    
    loglik = np.sum(
        y * np.log(p) +
        (1-y) * np.log(1-p)
    )
    
    logprior = norm.logpdf(beta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda b: -log_posterior(b), x0=np.array([0.0]))

beta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(beta_hat + eps)
    - 2 * log_posterior(beta_hat)
    + log_posterior(beta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode beta_hat:", beta_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(beta_hat - 4*std, beta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(b) for b in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=beta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(beta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("beta")
plt.ylabel("Density")
plt.title("Laplace Approximation for Bernoulli Logistic Model")
plt.legend()
plt.show()

What the Plot Shows

The solid curve is the normalized posterior.

The dashed curve is the Gaussian approximation:

\beta \mid D \approx N(\hat{\beta},V)

If the two curves are close, the Laplace approximation is good.

Example 2: Poisson Data With Log Rate

Model

Suppose we observe count data.

For example:

y_i

could be the number of emergency department arrivals per hour.

The model is:

y_i \sim Poisson(\lambda)

Since:

\lambda>0

we write:

\lambda=\exp(\theta)

So:

\theta=\log(\lambda)

Assume the prior:

\theta \sim N(0,10)

The likelihood is:

p(D \mid \theta)=\prod_{i=1}^{n}\frac{\lambda^{y_i}\exp(-\lambda)}{y_i!}

Since:

\lambda=\exp(\theta)

the log-likelihood is:

\log p(D \mid \theta)=\sum_{i=1}^{n}\left[y_i\theta-\exp(\theta)-\log(y_i!)\right]

The log-posterior is:

h(\theta)=\sum_{i=1}^{n}\left[y_i\theta-\exp(\theta)-\log(y_i!)\right]+\log N(\theta;0,10)

Laplace approximation gives:

\theta \mid D \approx N(\hat{\theta},V)

where:

V=-\frac{1}{h”(\hat{\theta})}

Then:

(\lambda \approx \exp(\hat{\theta}))

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, poisson

y = np.array([3,5,4,7,6,5])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(poisson.logpmf(y, mu=lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Poisson Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, poisson

y = np.array([3,5,4,7,6,5])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(poisson.logpmf(y, mu=lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Poisson Model")
plt.legend()
plt.show()

What the Plot Shows

The posterior is for:

(\theta=\log(\lambda))

not directly for (\lambda).

The Gaussian approximation is:

\theta \mid D \approx N(\hat{\theta},V)

Example 3: Binomial Data With Logistic Parameter

Model

Suppose a treatment is given to n=20 patients.

Out of them: y=15 respond.

The model is:

y \sim Binomial(n,p)

Use:

p=\frac{1}{1+\exp(-\theta)}

Prior:

\theta \sim N(0,10)

The likelihood is:

p(y \mid \theta)=\binom{n}{y}p^y(1-p)^{n-y}

The log-likelihood is:

\log p(y \mid \theta)=\log\binom{n}{y}+y\log(p)+(n-y)\log(1-p)

The log-posterior is:

h(\theta)=\log\binom{n}{y}+y\log(p)+(n-y)\log(1-p)+\log N(\theta;0,10)

Laplace approximation gives:

(\theta \mid D \approx N(\hat{\theta},V))

where:

V=-\frac{1}{h”(\hat{\theta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, binom

y = 15
n = 20

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    p = 1 / (1 + np.exp(-theta))
    
    loglik = binom.logpmf(y, n=n, p=p)
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

p_hat = 1 / (1 + np.exp(-theta_hat))

print("Posterior mode theta_hat:", theta_hat)
print("Approximate p:", p_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = logit(p)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Binomial Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, binom

y = 15
n = 20

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    p = 1 / (1 + np.exp(-theta))
    
    loglik = binom.logpmf(y, n=n, p=p)
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

p_hat = 1 / (1 + np.exp(-theta_hat))

print("Posterior mode theta_hat:", theta_hat)
print("Approximate p:", p_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = logit(p)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Binomial Model")
plt.legend()
plt.show()

What the Plot Shows

The approximation is Gaussian on the logit scale:

\theta=\log\left(\frac{p}{1-p}\right)

So:

\theta \mid D \approx N(\hat{\theta},V)

Then convert back:

p=\frac{1}{1+\exp(-\theta)}

Example 4: Exponential Data With Log Rate

Model

Suppose we observe waiting times.

For example:

y_i

could be the time until hospital discharge.

The model is:

y_i \sim Exponential(\lambda)

The exponential density is:

p(y_i \mid \lambda)=\lambda \exp(-\lambda y_i)

Since:

(\lambda>0)

we use:

\lambda=\exp(\theta))

Prior:

\theta \sim N(0,10))

The likelihood is:

p(D \mid \theta)=\prod_{i=1}^{n}\lambda \exp(-\lambda y_i)

The log-likelihood is:

\log p(D \mid \theta)=\sum_{i=1}^{n}\left[\theta-\exp(\theta)y_i\right]

The log-posterior is:

h(\theta)=\sum_{i=1}^{n}\left[\theta-\exp(\theta)y_i\right]+\log N(\theta;0,10)

Laplace approximation gives:

\theta \mid D \approx N(\hat{\theta},V)

where:

V=-\frac{1}{h”(\hat{\theta})}

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, expon

y = np.array([2.1, 1.7, 3.4, 2.8, 1.9, 2.5, 3.0])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(expon.logpdf(y, scale=1/lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Exponential Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, expon

y = np.array([2.1, 1.7, 3.4, 2.8, 1.9, 2.5, 3.0])

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    lam = np.exp(theta)
    
    loglik = np.sum(expon.logpdf(y, scale=1/lam))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([0.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

lambda_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate lambda:", lambda_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(lambda)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Exponential Model")
plt.legend()
plt.show()

What the Plot Shows

The solid line is the posterior for:

\theta=\log(\lambda)

The dashed line is:

\theta \mid D \approx N(\hat{\theta},V)

Example 5: Normal Data With Unknown Mean

Model

Suppose we observe blood pressure measurements.

Let:

y_i \sim N(\mu,\sigma^2)

Assume ( $\sigma$ ) is known.

Prior:

\mu \sim N(0,100)

The likelihood is:

p(D \mid \mu)=\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i-\mu)^2}{2\sigma^2}\right)

The log-likelihood is:

\log p(D \mid \mu)=\sum_{i=1}^{n}\left[-\frac{1}{2}\log(2\pi\sigma^2)-\frac{(y_i-\mu)^2}{2\sigma^2}\right]

The log-posterior is:

h(\mu)=\log p(D \mid \mu)+\log N(\mu;0,100)

Laplace approximation gives:

\mu \mid D \approx N(\hat{\mu},V)

where:

V=-\frac{1}{h”(\hat{\mu})}

In this case, because the likelihood and prior are both Gaussian, the posterior is already Gaussian.

So Laplace approximation is essentially exact.

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([128, 132, 125, 130, 135, 127, 131])
sigma = 5

def log_posterior(mu):
    mu = mu[0] if isinstance(mu, np.ndarray) else mu
    
    loglik = np.sum(norm.logpdf(y, loc=mu, scale=sigma))
    
    logprior = norm.logpdf(mu, loc=0, scale=10)
    
    return loglik + logprior

result = minimize(lambda m: -log_posterior(m), x0=np.array([120.0]))

mu_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(mu_hat + eps)
    - 2 * log_posterior(mu_hat)
    + log_posterior(mu_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode mu_hat:", mu_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(mu_hat - 4*std, mu_hat + 4*std, 500)

log_post_values = np.array([log_posterior(m) for m in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=mu_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(mu_hat, linestyle=":", label="Posterior mode")
plt.xlabel("mu")
plt.ylabel("Density")
plt.title("Laplace Approximation for Normal Mean Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm

y = np.array([128, 132, 125, 130, 135, 127, 131])
sigma = 5

def log_posterior(mu):
    mu = mu[0] if isinstance(mu, np.ndarray) else mu
    
    loglik = np.sum(norm.logpdf(y, loc=mu, scale=sigma))
    
    logprior = norm.logpdf(mu, loc=0, scale=10)
    
    return loglik + logprior

result = minimize(lambda m: -log_posterior(m), x0=np.array([120.0]))

mu_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(mu_hat + eps)
    - 2 * log_posterior(mu_hat)
    + log_posterior(mu_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

print("Posterior mode mu_hat:", mu_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(mu_hat - 4*std, mu_hat + 4*std, 500)

log_post_values = np.array([log_posterior(m) for m in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=mu_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(mu_hat, linestyle=":", label="Posterior mode")
plt.xlabel("mu")
plt.ylabel("Density")
plt.title("Laplace Approximation for Normal Mean Model")
plt.legend()
plt.show()

What the Plot Shows

The posterior and Laplace approximation should almost overlap.

That is because the log-posterior is exactly quadratic in (\mu).

A quadratic log-posterior gives an exact Gaussian posterior.

Example 6: Gamma Data With Log Shape Parameter

Model

Suppose we observe positive continuous data.

For example:

y_i

could be hospital length of stay.

Assume:

y_i \sim Gamma(\alpha,\beta)

where ( $\alpha$ ) is the shape and (\beta) is the rate.

Assume ( $\beta$ ) is known.

Since:

(\alpha>0)

we write:

\alpha=\exp(\theta)

Prior:

(\theta \sim N(0,10))

The gamma likelihood is:

p(y_i \mid \alpha,\beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}y_i^{\alpha-1}\exp(-\beta y_i)

The likelihood is:

p(D \mid \theta)=\prod_{i=1}^{n}\frac{\beta^\alpha}{\Gamma(\alpha)}y_i^{\alpha-1}\exp(-\beta y_i)

where:

\alpha=\exp(\theta)

The log-likelihood is:

\log p(D \mid \theta)=\sum_{i=1}^{n}\left[\alpha\log(\beta)-\log\Gamma(\alpha)+(\alpha-1)\log(y_i)-\beta y_i\right]

The log-posterior is:

h(\theta)=\log p(D \mid \theta)+\log N(\theta;0,10)

Laplace approximation gives:

\theta \mid D \approx N(\hat{\theta},V)

where:

V=-\frac{1}{h”(\hat{\theta})}

Then:

\alpha \approx \exp(\hat{\theta})

Python Code With Plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, gamma

y = np.array([2.4, 3.1, 2.8, 4.0, 3.6, 2.9, 3.3])

beta = 1.0

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    alpha = np.exp(theta)
    
    loglik = np.sum(gamma.logpdf(y, a=alpha, scale=1/beta))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

alpha_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate alpha:", alpha_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(alpha)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Gamma Shape Model")
plt.legend()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import norm, gamma

y = np.array([2.4, 3.1, 2.8, 4.0, 3.6, 2.9, 3.3])

beta = 1.0

def log_posterior(theta):
    theta = theta[0] if isinstance(theta, np.ndarray) else theta
    
    alpha = np.exp(theta)
    
    loglik = np.sum(gamma.logpdf(y, a=alpha, scale=1/beta))
    
    logprior = norm.logpdf(theta, loc=0, scale=np.sqrt(10))
    
    return loglik + logprior

result = minimize(lambda t: -log_posterior(t), x0=np.array([1.0]))

theta_hat = result.x[0]

eps = 1e-5

hessian = (
    log_posterior(theta_hat + eps)
    - 2 * log_posterior(theta_hat)
    + log_posterior(theta_hat - eps)
) / eps**2

variance = -1 / hessian
std = np.sqrt(variance)

alpha_hat = np.exp(theta_hat)

print("Posterior mode theta_hat:", theta_hat)
print("Approximate alpha:", alpha_hat)
print("Laplace variance:", variance)
print("Laplace standard deviation:", std)

grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)

log_post_values = np.array([log_posterior(t) for t in grid])

posterior_values = np.exp(log_post_values - np.max(log_post_values))
posterior_values = posterior_values / np.trapz(posterior_values, grid)

laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)

plt.figure(figsize=(8,5))
plt.plot(grid, posterior_values, label="Posterior")
plt.plot(grid, laplace_values, "--", label="Laplace approximation")
plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
plt.xlabel("theta = log(alpha)")
plt.ylabel("Density")
plt.title("Laplace Approximation for Gamma Shape Model")
plt.legend()
plt.show()

What the Plot Shows

The approximation is done on:

\theta=\log(\alpha)

not directly on (\alpha).

So the Gaussian approximation is:

\theta \mid D \approx N(\hat{\theta},V)

Then transform back:

\alpha=\exp(\theta)

Complete Reusable Function for Laplace Approximation

To avoid rewriting the same code again and again, we can create a reusable function.

import numpy as np
from scipy.optimize import minimize

def laplace_approximation(log_posterior, start_value):
    result = minimize(
        lambda x: -log_posterior(x),
        x0=np.array([start_value])
    )
    
    theta_hat = result.x[0]
    
    eps = 1e-5
    
    hessian = (
        log_posterior(theta_hat + eps)
        - 2 * log_posterior(theta_hat)
        + log_posterior(theta_hat - eps)
    ) / eps**2
    
    variance = -1 / hessian
    std = np.sqrt(variance)
    
    return theta_hat, variance, std

import numpy as np
from scipy.optimize import minimize

def laplace_approximation(log_posterior, start_value):
    result = minimize(
        lambda x: -log_posterior(x),
        x0=np.array([start_value])
    )
    
    theta_hat = result.x[0]
    
    eps = 1e-5
    
    hessian = (
        log_posterior(theta_hat + eps)
        - 2 * log_posterior(theta_hat)
        + log_posterior(theta_hat - eps)
    ) / eps**2
    
    variance = -1 / hessian
    std = np.sqrt(variance)
    
    return theta_hat, variance, std

And a reusable plotting function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_laplace(log_posterior, theta_hat, std, x_label, title):
    grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)
    
    log_post_values = np.array([log_posterior(t) for t in grid])
    
    posterior_values = np.exp(log_post_values - np.max(log_post_values))
    posterior_values = posterior_values / np.trapz(posterior_values, grid)
    
    laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)
    
    plt.figure(figsize=(8,5))
    plt.plot(grid, posterior_values, label="Posterior")
    plt.plot(grid, laplace_values, "--", label="Laplace approximation")
    plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
    plt.xlabel(x_label)
    plt.ylabel("Density")
    plt.title(title)
    plt.legend()
    plt.show()

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_laplace(log_posterior, theta_hat, std, x_label, title):
    grid = np.linspace(theta_hat - 4*std, theta_hat + 4*std, 500)
    
    log_post_values = np.array([log_posterior(t) for t in grid])
    
    posterior_values = np.exp(log_post_values - np.max(log_post_values))
    posterior_values = posterior_values / np.trapz(posterior_values, grid)
    
    laplace_values = norm.pdf(grid, loc=theta_hat, scale=std)
    
    plt.figure(figsize=(8,5))
    plt.plot(grid, posterior_values, label="Posterior")
    plt.plot(grid, laplace_values, "--", label="Laplace approximation")
    plt.axvline(theta_hat, linestyle=":", label="Posterior mode")
    plt.xlabel(x_label)
    plt.ylabel("Density")
    plt.title(title)
    plt.legend()
    plt.show()

Example use:

theta_hat, variance, std = laplace_approximation(log_posterior, start_value=0)

plot_laplace(
    log_posterior=log_posterior,
    theta_hat=theta_hat,
    std=std,
    x_label="theta",
    title="Laplace Approximation"
)

theta_hat, variance, std = laplace_approximation(log_posterior, start_value=0)

plot_laplace(
    log_posterior=log_posterior,
    theta_hat=theta_hat,
    std=std,
    x_label="theta",
    title="Laplace Approximation"
)

Final Summary

Laplace approximation follows this pattern:

p(\theta \mid D) \propto \exp(h(\theta))

Approximate the log-posterior near the mode:

h(\theta) \approx h(\hat{\theta})+\frac{1}{2}h”(\hat{\theta})(\theta-\hat{\theta})^2

Then:

p(\theta \mid D) \approx \exp\left(-\frac{(\theta-\hat{\theta})^2}{2V}\right)

where:

V=-\frac{1}{h”(\hat{\theta})}

So:

\theta \mid D \approx N(\hat{\theta},V)

The important idea is:

The log-posterior becomes approximately quadratic.

The posterior becomes approximately Gaussian.

That is why Laplace approximation works.

Laplace Approximation With Different Distributions: Python Examples With Plots

Example 1: Bernoulli Data With Logistic Parameter

Model

Python Code With Plot

What the Plot Shows

Example 2: Poisson Data With Log Rate

Model

Python Code With Plot

What the Plot Shows

Example 3: Binomial Data With Logistic Parameter

Model

Python Code With Plot

What the Plot Shows

Example 4: Exponential Data With Log Rate

Model

Python Code With Plot

What the Plot Shows

Example 5: Normal Data With Unknown Mean

Model

Python Code With Plot

What the Plot Shows

Example 6: Gamma Data With Log Shape Parameter

Model

Python Code With Plot

What the Plot Shows

Complete Reusable Function for Laplace Approximation

Final Summary

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Nerdish.Org