Chapter 3 — Generalized Linear Models (GLMs): The Big Framework Behind Modern Regression

In Chapter 2, we saw something important:

ordinary linear regression fails whenever data behaves differently from the assumptions of normality and constant variance.

That naturally leads to a question:

If ordinary regression fails, what replaces it?

The answer is: Generalized Linear Models (GLMs).

And Chapter 3 is where everything comes together.

This chapter is the foundation of:

  • logistic regression,
  • Poisson regression,
  • Gamma regression,
  • negative binomial models,
  • survival models,
  • even many machine learning frameworks.

If Chapter 2 explains why ordinary regression breaks,
Chapter 3 explains: how we fix it systematically.


The Big Idea of GLMs

GLMs are not “one model.”

They are: a framework.

A framework that allows us to use:

  • different distributions,
  • different variance structures,
  • different link functions,
    while still keeping the spirit of regression.

Core GLM structure:

g(μ)=Xβg(\mu)=X\beta

That single equation powers a huge portion of modern statistics.


The Three Components of a GLM

A GLM has three pieces.


1. Random Component

This specifies:

the probability distribution of the response variable.

Examples:

Type of DataDistribution
Continuous symmetricNormal
CountsPoisson
Binary outcomesBernoulli
Positive skewed dataGamma

This is the part where we respect:

  • the structure of the data.

2. Systematic Component

This is the linear predictor:

[Xβ][ X\beta ]

which means:

[β0+β1X1+β2X2+][ \beta_0+\beta_1X_1+\beta_2X_2+\cdots ]

This is still the “regression” part.

Examples:

  • deployment,
  • customer count,
  • month,
  • discount,
  • seasonality.

All influence the expected outcome.


3. Link Function

The link function connects:

  • the expected value of the response,
    to:
  • the linear predictor.

This is the heart of GLMs.


Why Do We Need a Link Function?

Because many outcomes cannot directly behave like:

[Xβ][ X\beta ]

Example:

  • probabilities must stay between 0 and 1,
  • counts must stay positive,
  • Gamma values must stay positive.

The link function transforms the mean into something that can safely behave linearly.


Example — Logistic Regression

Probabilities cannot exceed:

  • 1,
    or go below:

So instead of modeling probability directly,
we model: log-odds.

The logit link:

log(p1p)\log\left(\frac{p}{1-p}\right)

transforms:

  • probabilities,
    into:
  • real numbers.

Now regression becomes possible.


Example — Poisson Regression

Counts must remain positive.

So Poisson regression uses: the log link.

log(μ)=Xβ\log(\mu)=X\beta

Exponentiating gives:

[μ=eXβ][ \mu=e^{X\beta} ]

which guarantees:

  • positive predicted counts.

Very elegant.


Bernoulli Distribution — The Foundation of Logistic Regression

A Bernoulli random variable models: one trial with two outcomes.

Examples:

  • buy/not buy,
  • churn/no churn,
  • sold/not sold.

Possible outcomes:

OutcomeMeaning
1success
0failure

Mean and Variance of Bernoulli

Mean:

E(Y)=pE(Y)=p

Variance:

Var(Y)=p(1p)Var(Y)=p(1-p)

This is extremely important.

Notice:

  • variance depends on the mean.

That is already different from ordinary regression.


A Beautiful Insight About Bernoulli Variance

The variance peaks at:

p=0.5p=0.5

Why?

Because uncertainty is highest when:

  • both outcomes are equally likely.

If:

  • (p=0),
    or:
  • (p=1),

there is almost no uncertainty.

So variance shrinks again.

This explains why Bernoulli behaves differently from Poisson.


Odds — The Key to Logistic Regression

Suppose:

  • probability of winning = 0.75.

Then:

  • probability of losing = 0.25.

Odds are:

0.750.25=3 \frac{0.75}{0.25}=3

Meaning:

  • winning is three times as likely as losing.

Logistic regression models: the log of the odds.


Poisson Regression — Modeling Counts

Poisson regression is used for:

  • sales counts,
  • purchases,
  • website clicks,
  • arrivals,
  • support tickets.

Its defining property:

Var(Y)=E(Y)=μVar(Y)=E(Y)=\mu

Variance grows naturally with the mean.

This is exactly what count data often does.


Real Business Example

Suppose:

  • small retailers sell 2 items/month,
  • large retailers sell 100 items/month.

Large retailers naturally fluctuate more.

Poisson understands this.

Ordinary regression assumes:

  • constant variability.

That becomes unrealistic.


Gamma Regression — Positive Skewed Data

Gamma regression handles:

  • positive continuous,
  • right-skewed data.

Examples:

  • revenue per transaction,
  • insurance claims,
  • customer spending,
  • waiting times.

Variance behaves like:

Var(Y)=\phi\mu^2

As mean grows:

  • variability grows even faster.

Very common in business.


Canonical Links

Each GLM has a natural or canonical link.

DistributionCanonical Link
Normalidentity
Bernoullilogit
Poissonlog
Gammainverse

Identity Link

For normal regression:

g(μ)=μg(\mu)=\mu

Nothing changes.

This is why ordinary regression is actually: a special case of GLM.

That is a very important insight.


Overdispersion — When Poisson Breaks

Real count data often violates:
[
Var(Y)=\mu
]

Instead:

[
Var(Y)>\mu
]

This is called: overdispersion.


Why Overdispersion Happens

Because real-world systems are heterogeneous.

Different parts of the data behave differently.

Example:

Retailer TypeAverage Sales
Weak1
Medium5
Strong20

Combining them creates:

  • extra variability.

That’s why overdispersion often indicates:

  • hidden groups,
  • omitted variables,
  • clustering,
  • latent heterogeneity.

Negative Binomial Regression

Negative binomial extends Poisson by adding extra variance.

Variance becomes:

Var(Y)=μ+αμ2Var(Y)=\mu+\alpha\mu^2

where:

  • (α)(\alpha) controls extra dispersion.

This is one of the most practical count models in real business analytics.


Real sales Example

Suppose you model:

  • monthly sales counts,
  • by category,
  • over time.

You may include:

  • month,
  • customer count,
  • year,
  • seasonality.

A realistic model might look like:

log(μ)=β0+β1Month+β2f(CustomerCount)+β3Year\log(\mu)=\beta_0+\beta_1Month+\beta_2f(CustomerCount)+\beta_3Year

where:

  • (f(\cdot)) is a spline.

Now we are entering: Generalized Additive Models (GAMs).


Splines and Nonlinearity

Not all relationships are linear.

Example:

  • adding customers may help sales strongly initially,
  • then level off later.

A spline allows:

  • smooth nonlinear effects.

Instead of:

βX\beta X

we use:

f(X)f(X)

This becomes a GAM.


Forecasting Inventory

One fascinating business application discussed was: ideal inventory forecasting.

Using:

  • negative binomial counts,
  • seasonality,
  • customer count,
  • splines,
  • weighted recent data,
  • lead times,
  • safety stock.

This is exactly how advanced operational forecasting systems work.


Weighted Forecasting

You explored:

  • 70% weight on recent 6 months,
  • 30% weight on older 18 months.

This is actually very practical.

Why?

Because:

  • markets evolve,
  • trends shift,
  • recent data often matters more.

Safety Stock — Beyond Normality

Traditional safety stock formulas assume:

  • normal demand.

But negative binomial demand is:

  • discrete,
  • skewed,
  • overdispersed.

So instead of:

  • normal approximations,

you can use:

percentiles from the fitted distribution.

For example:

  • forecast the next 12 months,
  • scale to 4-month lead time,
  • calculate 90th percentile demand.

That becomes: a probabilistic inventory target.

This is a far more modern approach.


The Deep Philosophy of GLMs

Chapter 3 teaches something profound:

data types matter.

You cannot force:

  • counts,
  • probabilities,
  • skewed revenue,
  • overdispersed demand,

into one simplistic framework.

GLMs adapt the model to the behavior of reality.

That is why they are so powerful.


Final Thought

Ordinary regression is only one small corner of statistical modeling.

GLMs generalize regression into a flexible system capable of modeling:

  • counts,
  • probabilities,
  • skewed positive data,
  • overdispersion,
  • seasonality,
  • nonlinear effects,
  • real business processes.

And once you truly understand GLMs,
you begin to realize:

the distribution is not just mathematics —
it is a description of how reality behaves.

Posted in

Leave a comment