Chapter 9 — Generalized Linear Mixed Models (GLMMs): When Data Has Hidden Groups and Individual Differences

By Chapter 9, something important happens.

We already learned:

  • GLMs handle non-normal outcomes.
  • GEE handles correlated observations.

But there is still a major problem.

Suppose two retailers have:

  • same deployment,
  • same inventory,
  • same customer count,
  • same pricing.

Yet one consistently sells better. Why?

Because real systems contain: hidden differences.

Chapter 9 introduces one of the most powerful frameworks in modern statistics: Generalized Linear Mixed Models (GLMMs)

This chapter combines:

  • GLMs,
  • random effects,
  • hierarchical modeling.

This is where modern forecasting starts becoming realistic.


Why GLMs Alone Become Insufficient

Suppose: You model:

log(μ)=β0+β1Deployment\log(\mu)=\beta_0+\beta_1Deployment

This assumes:

everyone follows one equation.

But reality:

RetailerBaseline Sales
A5
B20
C80

One equation cannot explain everyone.

We need hidden adjustments.


The Big Idea of GLMM

GLMM says:

there are global effects and local effects.

Global:

  • overall trend.

Local:

  • subject-specific adjustments.

Model:

g(μij)=Xijβ+Zijujg(\mu_{ij})=X_{ij}\beta+Z_{ij}u_j

Looks scary.

But it is simple.


Breaking the Equation Down

Left side:

g(μ)g(\mu)

Link function.

Examples:

  • log,
  • logit.

Fixed effects:

XβX\beta

Average population behavior.


Random effects:

ZuZu

Individual deviation.


Fixed Effects vs Random Effects

This confused many people.


Fixed Effects

Estimate directly.

Example:

Deployment effect: +10 sales.

Everyone shares this.


Random Effects

Random adjustment.

Example:

Retailer A:
+5

Retailer B:
−2

Retailer C:
+12


Random Intercept Model

Most common GLMM.

Model:

g(μij)=β0+β1Xij+ujg(\mu_{ij})=\beta_0+\beta_1X_{ij}+u_j

Meaning:

Each subject gets: own baseline.


Example

Retailer sales:

RetailerRandom Effect
A+15
B−10

Now:

same deployment.

Different outcomes.


Random Slope Model

Then Chapter 9 becomes more powerful.

Model:

g(μij)=β0+(β1+uj)Xijg(\mu_{ij})=\beta_0+(\beta_1+u_j)X_{ij}

Now slopes vary.


What Does Random Slope Mean?

You asked this before.

Suppose:

Deployment effect.

Retailer A:

Deployment strongly helps.

Retailer B:

Deployment barely helps.

Now slope changes.


Numerical Example

Without random slope:

Sales increase:10


With random slope:

Retailer A 20

Retailer B: 3.

    Very realistic.


    Why GLMM Is Different From GEE

    This was one of your earlier questions.


    GEE

    Population average.

    Question:

    “What happens overall?”


    GLMM

    Subject-specific.

    Question:

    “What happens for this retailer?”


    Example:

    Average deployment effect:

    +10.

    Retailer-specific:

    +25.

    Different interpretation.


    Random Effects Are Hidden Variables

    Think of them as:

    latent tendencies.

    Examples:

    Retailer quality.

    Customer loyalty.

    Store management.

    Market attractiveness.

    These are difficult to measure.

    Random effects capture them.


    Logistic GLMM

    Binary outcomes.

    Example:

    Will customer buy?

    Model:

    log(pij1pij)=Xβ+uj\log\left(\frac{p_{ij}}{1-p_{ij}}\right)=X\beta+u_j

    Now:

    each customer gets own baseline.


    Poisson GLMM

    Count outcomes.

    Example:

    Sales count.

    Model:

    log(μij)=Xβ+uj\log(\mu_{ij})=X\beta+u_j

    Now:

    counts differ across groups.


    Why Random Effects Create Overdispersion

    Suppose:

    Each retailer:

    Poisson.

    But means differ.

    Retailer means:

    1, 10, 50.

    Combining them:

    Variance explodes.

    This produces: overdispersion.

    This connects directly to your earlier negative binomial discussion.


    Marginal Likelihood — The Difficult Part

    Chapter 9 becomes mathematically heavier.

    Likelihood:

    L=f(Y|u)f(u)duL=\int f(Y|u)f(u)\,du

    Interpretation:

    Average likelihood across possible random effects.


    Why Is This Hard?

    Because:

    random effects are unknown.

    Need integration.

    Often impossible analytically.


    Laplace Approximation

    Approximate integral.

    Idea:

    Use local normal approximation.

    Fast.

    Common.


    Gaussian Quadrature

    Evaluate likelihood at carefully chosen points.

    More accurate.

    Slower.


    MCMC

    Bayesian simulation.

    Repeatedly sample.

    Produces:

    posterior distributions.

    Very flexible.


    Bayesian Hierarchical Models

    Natural extension of GLMM.

    Model:

    Level 1:
    observations.

    Level 2:
    random effects.

    Level 3:
    hyperparameters.

    This becomes:

    full hierarchical modeling.


    Shrinkage — One of the Coolest Ideas

    Random effects pull extreme estimates inward.

    Example:

    Observed:

    Retailer:

    200 sales.

    Model says:

    probably unusually lucky.

    Estimate becomes:

    1. 150.

    This is:

    shrinkage.

    Very powerful.


    Real Example

    Suppose:

    Goal:

    Forecast monthly sales.

    Data:

    | SKU | Month | Sales | Customer Count |

    Model:

    log(μ)=β0+β1Month+f(CustomerCount)+uSKU\log(\mu)=\beta_0+\beta_1Month+f(CustomerCount)+u_{SKU}

    Meaning:

    • seasonality,
    • nonlinear customer effect,
    • SKU-specific adjustment.

    This is extremely close to what you were building.


    Inventory Forecast Example

    Suppose:

    Different SKUs behave differently.

    Random effects allow:

    partial pooling.

    Strong categories:
    less shrinkage.

    Weak categories:
    more shrinkage.

    Inventory becomes more stable.


    When Should You Use GLMM?

    Use GLMM when:

    • repeated observations,
    • grouped data,
    • hidden heterogeneity,
    • subject-specific prediction.

    Examples:

    • customer forecasting,
    • retailer forecasting,
    • SKU forecasting,
    • churn prediction.

    When NOT to Use GLMM

    If only interested in:

    overall effect.

    Use: GEE.


    The Deep Lesson

    Chapter 9 teaches:

    not all variation deserves fixed coefficients.

    Some variation should remain random.

    And modeling that uncertainty often improves prediction dramatically.


    Final Thought

    Before Chapter 9:

    you ask:

    “What is the average relationship?”

    After Chapter 9:

    you ask:

    “How does each group differ from the average?”

    That shift takes statistics from:

    one equation for everyone

    to

    learning individual behavior inside populations.

    And that is where modern predictive analytics truly begins.

    Leave a comment