In Chapter 2, we saw something important:
ordinary linear regression fails whenever data behaves differently from the assumptions of normality and constant variance.
That naturally leads to a question:
If ordinary regression fails, what replaces it?
The answer is: Generalized Linear Models (GLMs).
And Chapter 3 is where everything comes together.
This chapter is the foundation of:
- logistic regression,
- Poisson regression,
- Gamma regression,
- negative binomial models,
- survival models,
- even many machine learning frameworks.
If Chapter 2 explains why ordinary regression breaks,
Chapter 3 explains: how we fix it systematically.
The Big Idea of GLMs
GLMs are not “one model.”
They are: a framework.
A framework that allows us to use:
- different distributions,
- different variance structures,
- different link functions,
while still keeping the spirit of regression.
Core GLM structure:
That single equation powers a huge portion of modern statistics.
The Three Components of a GLM
A GLM has three pieces.
1. Random Component
This specifies:
the probability distribution of the response variable.
Examples:
| Type of Data | Distribution |
|---|---|
| Continuous symmetric | Normal |
| Counts | Poisson |
| Binary outcomes | Bernoulli |
| Positive skewed data | Gamma |
This is the part where we respect:
- the structure of the data.
2. Systematic Component
This is the linear predictor:
which means:
This is still the “regression” part.
Examples:
- deployment,
- customer count,
- month,
- discount,
- seasonality.
All influence the expected outcome.
3. Link Function
The link function connects:
- the expected value of the response,
to: - the linear predictor.
This is the heart of GLMs.
Why Do We Need a Link Function?
Because many outcomes cannot directly behave like:
Example:
- probabilities must stay between 0 and 1,
- counts must stay positive,
- Gamma values must stay positive.
The link function transforms the mean into something that can safely behave linearly.
Example — Logistic Regression
Probabilities cannot exceed:
- 1,
or go below:
So instead of modeling probability directly,
we model: log-odds.
The logit link:
transforms:
- probabilities,
into: - real numbers.
Now regression becomes possible.
Example — Poisson Regression
Counts must remain positive.
So Poisson regression uses: the log link.
Exponentiating gives:
which guarantees:
- positive predicted counts.
Very elegant.
Bernoulli Distribution — The Foundation of Logistic Regression
A Bernoulli random variable models: one trial with two outcomes.
Examples:
- buy/not buy,
- churn/no churn,
- sold/not sold.
Possible outcomes:
| Outcome | Meaning |
|---|---|
| 1 | success |
| 0 | failure |
Mean and Variance of Bernoulli
Mean:
Variance:
This is extremely important.
Notice:
- variance depends on the mean.
That is already different from ordinary regression.
A Beautiful Insight About Bernoulli Variance
The variance peaks at:
Why?
Because uncertainty is highest when:
- both outcomes are equally likely.
If:
- (p=0),
or: - (p=1),
there is almost no uncertainty.
So variance shrinks again.
This explains why Bernoulli behaves differently from Poisson.
Odds — The Key to Logistic Regression
Suppose:
- probability of winning = 0.75.
Then:
- probability of losing = 0.25.
Odds are:
Meaning:
- winning is three times as likely as losing.
Logistic regression models: the log of the odds.
Poisson Regression — Modeling Counts
Poisson regression is used for:
- sales counts,
- purchases,
- website clicks,
- arrivals,
- support tickets.
Its defining property:
Variance grows naturally with the mean.
This is exactly what count data often does.
Real Business Example
Suppose:
- small retailers sell 2 items/month,
- large retailers sell 100 items/month.
Large retailers naturally fluctuate more.
Poisson understands this.
Ordinary regression assumes:
- constant variability.
That becomes unrealistic.
Gamma Regression — Positive Skewed Data
Gamma regression handles:
- positive continuous,
- right-skewed data.
Examples:
- revenue per transaction,
- insurance claims,
- customer spending,
- waiting times.
Variance behaves like:
Var(Y)=\phi\mu^2
As mean grows:
- variability grows even faster.
Very common in business.
Canonical Links
Each GLM has a natural or canonical link.
| Distribution | Canonical Link |
|---|---|
| Normal | identity |
| Bernoulli | logit |
| Poisson | log |
| Gamma | inverse |
Identity Link
For normal regression:
Nothing changes.
This is why ordinary regression is actually: a special case of GLM.
That is a very important insight.
Overdispersion — When Poisson Breaks
Real count data often violates:
[
Var(Y)=\mu
]
Instead:
[
Var(Y)>\mu
]
This is called: overdispersion.
Why Overdispersion Happens
Because real-world systems are heterogeneous.
Different parts of the data behave differently.
Example:
| Retailer Type | Average Sales |
|---|---|
| Weak | 1 |
| Medium | 5 |
| Strong | 20 |
Combining them creates:
- extra variability.
That’s why overdispersion often indicates:
- hidden groups,
- omitted variables,
- clustering,
- latent heterogeneity.
Negative Binomial Regression
Negative binomial extends Poisson by adding extra variance.
Variance becomes:
where:
- controls extra dispersion.
This is one of the most practical count models in real business analytics.
Real sales Example
Suppose you model:
- monthly sales counts,
- by category,
- over time.
You may include:
- month,
- customer count,
- year,
- seasonality.
A realistic model might look like:
where:
- (f(\cdot)) is a spline.
Now we are entering: Generalized Additive Models (GAMs).
Splines and Nonlinearity
Not all relationships are linear.
Example:
- adding customers may help sales strongly initially,
- then level off later.
A spline allows:
- smooth nonlinear effects.
Instead of:
we use:
This becomes a GAM.
Forecasting Inventory
One fascinating business application discussed was: ideal inventory forecasting.
Using:
- negative binomial counts,
- seasonality,
- customer count,
- splines,
- weighted recent data,
- lead times,
- safety stock.
This is exactly how advanced operational forecasting systems work.
Weighted Forecasting
You explored:
- 70% weight on recent 6 months,
- 30% weight on older 18 months.
This is actually very practical.
Why?
Because:
- markets evolve,
- trends shift,
- recent data often matters more.
Safety Stock — Beyond Normality
Traditional safety stock formulas assume:
- normal demand.
But negative binomial demand is:
- discrete,
- skewed,
- overdispersed.
So instead of:
- normal approximations,
you can use:
percentiles from the fitted distribution.
For example:
- forecast the next 12 months,
- scale to 4-month lead time,
- calculate 90th percentile demand.
That becomes: a probabilistic inventory target.
This is a far more modern approach.
The Deep Philosophy of GLMs
Chapter 3 teaches something profound:
data types matter.
You cannot force:
- counts,
- probabilities,
- skewed revenue,
- overdispersed demand,
into one simplistic framework.
GLMs adapt the model to the behavior of reality.
That is why they are so powerful.
Final Thought
Ordinary regression is only one small corner of statistical modeling.
GLMs generalize regression into a flexible system capable of modeling:
- counts,
- probabilities,
- skewed positive data,
- overdispersion,
- seasonality,
- nonlinear effects,
- real business processes.
And once you truly understand GLMs,
you begin to realize:
the distribution is not just mathematics —
it is a description of how reality behaves.





