GLM Chapter 2 — Why Ordinary Linear Regression Breaks Down

pj316

3–5 minutes

When most people first learn regression, they learn ordinary linear regression.

You know the equation:

Y=\beta_0+\beta_1X+\epsilon

At first glance, it feels incredibly powerful.

You take some predictor:

deployment,
advertising spend,
customer count,
square footage,

and try to predict an outcome:

sales,
revenue,
profit,
growth.

Simple.

Clean.

Elegant.

But there’s a problem.

Real life is messy.

And ordinary linear regression quietly makes assumptions that reality often violates.

The Hidden Assumptions of Ordinary Linear Regression

Ordinary linear regression works beautifully only when certain assumptions hold.

The big ones are:

The outcome is continuous
The errors are normally distributed
The variance is constant
Relationships are linear
Observations are independent

At first, these seem harmless.

Until you actually start looking at real-world data.

Problem 1 — Outcomes Aren’t Always Continuous

Suppose I want to predict:

number of monthly sales,
number of customer purchases,
number of support tickets,
number of diamonds sold.

These are counts.

Counts are:

You cannot sell:

1.4 diamonds,
2.7 customers,
0.3 purchases.

But ordinary linear regression does not know that.

It can happily predict:

negative values,
decimals,
impossible counts.

That’s already a warning sign.

“Why Not Just Round?”

This is one of the first ideas people suggest.

“Why not predict 2.7 and just round it to 3?”

Because the issue is not just integers.

The deeper issue is: the variance structure.

The Real Problem — Variance Changes With the Mean

Ordinary regression assumes:

variability stays roughly constant.

But count data rarely behaves that way.

Suppose:

Average Monthly Sales	Variance
2	2
20	20
200	200

Notice something?

As the mean increases,
the variability increases too.

This violates ordinary regression assumptions.

And this is exactly where the Poisson model comes in.

Poisson Distribution — Modeling Counts Properly

The Poisson distribution is designed for count data.

Its defining property is:

Var(Y)=E(Y)=\mu

Meaning:

variance equals the mean.

If average sales rise,
variability naturally rises too.

That is far more realistic for count processes.

Why This Matters in Business

Suppose you are analyzing:

retailer sales,
website clicks,
customer purchases,
monthly inquiries.

Small retailers:

may sell 1–2 pieces.

Large retailers:

may sell 50–100 pieces.

Naturally:

large retailers fluctuate more.

Poisson models understand this.

Ordinary regression does not.

Problem 2 — Binary Outcomes

Now suppose the question changes.

Instead of:

“How many?”

you ask:

“Will it happen?”

Examples:

Will the customer churn?
Will the customer buy?
Will this diamond sell?
Will the campaign succeed?

Now the outcome is:

yes/no,
0/1.

Ordinary regression struggles again.

Why?

Because probabilities must stay between:

0 and 1.

But ordinary regression can predict:

-0.3,
1.7,
2.1.

Impossible probabilities.

Logistic Regression Solves This

Logistic regression models probabilities properly.

It uses the logistic function:

\log\left(\frac{p}{1-p}\right)=X\beta

This transforms probabilities into:

log-odds,
which can range from:
negative infinity to positive infinity.

That allows us to model probabilities safely.

Odds — The Thing That Confuses Everyone

Suppose a football team has:

3 to 1 odds.

That means:

winning is three times as likely as losing.

Mathematically:

\frac{P(win)}{P(lose)}=3

Since:

win + lose = 1,

the probability of winning becomes:

75%.

Logistic regression models: log-odds.

That’s why it works so naturally for binary decisions.

Problem 3 — Positive Skewed Data

Now suppose we model:

revenue per transaction,
insurance claims,
waiting times,
customer spending.

These are:

continuous,
positive,
heavily right-skewed.

Ordinary regression again struggles.

Why?

Because:

large values often have larger variability.

Gamma Distribution

Gamma models are excellent for:

positive skewed continuous data.

Their variance behaves like:

Var(Y)=\phi\mu^2

Meaning:

variability increases rapidly as mean increases.

This is extremely common in:

revenue,
transaction values,
insurance,
business forecasting.

The Big Insight of Chapter 2

Different types of data have: different variance structures.

That’s the heart of the problem.

Ordinary regression assumes:

one fixed structure.

Reality does not.

This Leads to GLMs

Generalized Linear Models solve this by allowing:

Component	Purpose
Distribution	model correct data type
Link function	connect mean to predictors
Linear predictor	model relationships

Core structure:

g(\mu)=X\beta

Different Data → Different GLMs

Data Type	Distribution	Typical Link
Continuous normal	Gaussian	identity
Counts	Poisson	log
Binary	Bernoulli/Binomial	logit
Positive skewed	Gamma	inverse/log

What Chapter 2 Really Teaches

Chapter 2 is not just about formulas.

It teaches something deeper:

statistical models must respect how data behaves.

That means:

respecting boundaries,
respecting variance,
respecting skewness,
respecting discreteness,
respecting probability structure.

And once you understand that,
GLMs stop feeling like “different models.”

Instead, they become: natural extensions of regression for real life.

Final Thought

Ordinary regression is not “wrong.”

It’s just specialized.

GLMs generalize regression into a framework capable of handling:

counts,
probabilities,
skewed positive data,
changing variance structures,
real-world uncertainty.

And once you start working with real business data,
you quickly realize:

the world is rarely normal.

One response to “GLM Chapter 2 — Why Ordinary Linear Regression Breaks Down”

Chapter 3 — Generalized Linear Models (GLMs): The Big Framework Behind Modern Regression – Nerdish.Org

May 20, 2026 at 11:15 pm

[…] GLM Chapter 2 — Why Ordinary Linear Regression Breaks Down […]

Loading…

Reply

nerd-ish

Leave a ReplyCancel reply

Lesson 11: The Dominated Convergence Theorem

Lesson 10: The Monotone Convergence Theorem

Lesson 7: Integration Before Probability