GLM Chapter 2 — Why Ordinary Linear Regression Breaks Down

When most people first learn regression, they learn ordinary linear regression.

You know the equation:

Y=β0+β1X+ϵY=\beta_0+\beta_1X+\epsilon

At first glance, it feels incredibly powerful.

You take some predictor:

  • deployment,
  • advertising spend,
  • customer count,
  • square footage,

and try to predict an outcome:

  • sales,
  • revenue,
  • profit,
  • growth.

Simple.

Clean.

Elegant.

But there’s a problem.

Real life is messy.

And ordinary linear regression quietly makes assumptions that reality often violates.


The Hidden Assumptions of Ordinary Linear Regression

Ordinary linear regression works beautifully only when certain assumptions hold.

The big ones are:

  1. The outcome is continuous
  2. The errors are normally distributed
  3. The variance is constant
  4. Relationships are linear
  5. Observations are independent

At first, these seem harmless.

Until you actually start looking at real-world data.


Problem 1 — Outcomes Aren’t Always Continuous

Suppose I want to predict:

  • number of monthly sales,
  • number of customer purchases,
  • number of support tickets,
  • number of diamonds sold.

These are counts.

Counts are:

  • 0,
  • 1,
  • 2,
  • 3,

You cannot sell:

  • 1.4 diamonds,
  • 2.7 customers,
  • 0.3 purchases.

But ordinary linear regression does not know that.

It can happily predict:

  • negative values,
  • decimals,
  • impossible counts.

That’s already a warning sign.


“Why Not Just Round?”

This is one of the first ideas people suggest.

“Why not predict 2.7 and just round it to 3?”

Because the issue is not just integers.

The deeper issue is: the variance structure.


The Real Problem — Variance Changes With the Mean

Ordinary regression assumes:

variability stays roughly constant.

But count data rarely behaves that way.

Suppose:

Average Monthly SalesVariance
22
2020
200200

Notice something?

As the mean increases,
the variability increases too.

This violates ordinary regression assumptions.

And this is exactly where the Poisson model comes in.


Poisson Distribution — Modeling Counts Properly

The Poisson distribution is designed for count data.

Its defining property is:

Var(Y)=E(Y)=μVar(Y)=E(Y)=\mu

Meaning:

  • variance equals the mean.

If average sales rise,
variability naturally rises too.

That is far more realistic for count processes.


Why This Matters in Business

Suppose you are analyzing:

  • retailer sales,
  • website clicks,
  • customer purchases,
  • monthly inquiries.

Small retailers:

  • may sell 1–2 pieces.

Large retailers:

  • may sell 50–100 pieces.

Naturally:

  • large retailers fluctuate more.

Poisson models understand this.

Ordinary regression does not.


Problem 2 — Binary Outcomes

Now suppose the question changes.

Instead of:

“How many?”

you ask:

“Will it happen?”

Examples:

  • Will the customer churn?
  • Will the customer buy?
  • Will this diamond sell?
  • Will the campaign succeed?

Now the outcome is:

  • yes/no,
  • 0/1.

Ordinary regression struggles again.

Why?

Because probabilities must stay between:

  • 0 and 1.

But ordinary regression can predict:

  • -0.3,
  • 1.7,
  • 2.1.

Impossible probabilities.


Logistic Regression Solves This

Logistic regression models probabilities properly.

It uses the logistic function:

\log\left(\frac{p}{1-p}\right)=X\beta

This transforms probabilities into:

  • log-odds,
    which can range from:
  • negative infinity to positive infinity.

That allows us to model probabilities safely.


Odds — The Thing That Confuses Everyone

Suppose a football team has:

  • 3 to 1 odds.

That means:

winning is three times as likely as losing.

Mathematically:

P(win)P(lose)=3 \frac{P(win)}{P(lose)}=3

Since:

  • win + lose = 1,

the probability of winning becomes:

  • 75%.

Logistic regression models: log-odds.

That’s why it works so naturally for binary decisions.


Problem 3 — Positive Skewed Data

Now suppose we model:

  • revenue per transaction,
  • insurance claims,
  • waiting times,
  • customer spending.

These are:

  • continuous,
  • positive,
  • heavily right-skewed.

Ordinary regression again struggles.

Why?

Because:

  • large values often have larger variability.

Gamma Distribution

Gamma models are excellent for:

  • positive skewed continuous data.

Their variance behaves like:

Var(Y)=ϕμ2Var(Y)=\phi\mu^2

Meaning:

  • variability increases rapidly as mean increases.

This is extremely common in:

  • revenue,
  • transaction values,
  • insurance,
  • business forecasting.

The Big Insight of Chapter 2

Different types of data have: different variance structures.

That’s the heart of the problem.

Ordinary regression assumes:

  • one fixed structure.

Reality does not.


This Leads to GLMs

Generalized Linear Models solve this by allowing:

ComponentPurpose
Distributionmodel correct data type
Link functionconnect mean to predictors
Linear predictormodel relationships

Core structure:

g(μ)=Xβg(\mu)=X\beta

Different Data → Different GLMs

Data TypeDistributionTypical Link
Continuous normalGaussianidentity
CountsPoissonlog
BinaryBernoulli/Binomiallogit
Positive skewedGammainverse/log

What Chapter 2 Really Teaches

Chapter 2 is not just about formulas.

It teaches something deeper:

statistical models must respect how data behaves.

That means:

  • respecting boundaries,
  • respecting variance,
  • respecting skewness,
  • respecting discreteness,
  • respecting probability structure.

And once you understand that,
GLMs stop feeling like “different models.”

Instead, they become: natural extensions of regression for real life.


Final Thought

Ordinary regression is not “wrong.”

It’s just specialized.

GLMs generalize regression into a framework capable of handling:

  • counts,
  • probabilities,
  • skewed positive data,
  • changing variance structures,
  • real-world uncertainty.

And once you start working with real business data,
you quickly realize:

the world is rarely normal.

Posted in

Leave a comment