GLM Chapter 1- Why Ordinary Linear Regression Is Not Enough: The Beginning of Generalized Linear Models (GLMs)

Introduction

When most people first learn statistics, they usually begin with: ordinary linear regression.

Y=β0+β1+ϵY=\beta_0+\beta_1+\epsilon

This model is elegant, powerful, and foundational to modern statistics.

It assumes:

  • a linear relationship between variables,
  • normally distributed errors,
  • constant variance,
  • and outcomes that can theoretically take any value from negative infinity to positive infinity.

For many datasets, this works beautifully.

But very quickly, real-world data begins breaking these assumptions.

What if:

  • the outcome is the number of customers visiting a store?
  • the response is whether a patient survives or not?
  • the data can never become negative?
  • the variance increases as the mean increases?
  • the outcome is a probability between 0 and 1?

Ordinary regression begins to fail.

This is where: Generalized Linear Models (GLMs) enter the picture.

GLMs extend ordinary regression to handle:

  • counts,
  • probabilities,
  • rates,
  • waiting times,
  • positive continuous outcomes,
  • and many other real-world data structures.

This blog introduces the core intuition behind why GLMs were created and why they revolutionized statistics.


The Hidden Assumption of Ordinary Regression

One of the most important assumptions in ordinary least squares (OLS) regression is:

Var(Y)=σ2Var(Y)=\sigma^2

This means: the variance stays constant.

In other words: small values fluctuate about as much as large values.

But real life often does not behave that way.


Example — Store Sales

Suppose we observe monthly sales counts.

StoreAverage Monthly Sales
Small Store2
Large Store500

Would both stores fluctuate equally?

Of course not.

The larger store naturally experiences:

  • larger fluctuations,
  • larger variability,
  • and larger uncertainty.

This leads to one of the deepest ideas in generalized linear models:

Variance often depends on the mean.


Why Variance Depends on the Mean

Suppose a small store averages:

  • 2 sales per day.

Typical observations might be:

  • 1, 2, 3,4

Now consider a large store averaging:

  • 500 sales per day.

Typical observations might be:

  • 470, 510, 530, 495

Notice something important:

The large store fluctuates much more in absolute terms.

This is not a problem.
This is natural behavior.

Many real-world systems behave this way:

  • larger systems fluctuate more,
  • larger averages produce larger variance.

Ordinary regression cannot naturally accommodate this.


Count Data Creates Problems for Ordinary Regression

Suppose we model:

  • number of defects,
  • number of hospital visits,
  • number of website clicks,
  • number of daily sales.

These are: count data.

Counts have several important properties:

  • they cannot be negative,
  • they are discrete,
  • and their variance often increases with the mean.

But ordinary regression may predict: say -3

Impossible.

This is one of the earliest motivations for: Poisson regression.


The Poisson Distribution

The Poisson distribution is one of the most important distributions in statistics.

It is designed for: counts.

One remarkable property of the Poisson distribution is:

Var(Y)=E(Y)=μVar(Y)=E(Y)=\mu

This means: the variance equals the mean. As the average count increases: the variability naturally increases too. This is exactly what we observe in many real-world count processes.


Poisson Regression

Instead of using ordinary regression, GLMs use: Poisson regression for count data.

The model becomes:

log(μ)=β0+β1X\log(\mu)=\beta_0+\beta_1 X

where:

  • μ\mu represents the expected count.

Why Use the Logarithm?

This is one of the most important conceptual ideas in GLMs.

Suppose we modeled counts directly:

μ=β0+β1X\mu=\beta_0+\beta_1X

The prediction could become negative.

That would make no sense.

Instead, GLMs use:

μ=exp(β0+β1X)\mu=\exp(\beta_0+\beta_1X)

Exponentials are always positive.

Therefore: predicted counts remain positive automatically.

This is the power of: link functions.


Binary Outcomes Create Another Problem

Suppose the outcome is:

  • sold/not sold,
  • disease/no disease,
  • churn/no churn.

Now the response is: binary.

Probabilities must satisfy:

0p10\leq p\leq 1

But ordinary regression might predict:

  • 1.4,
  • or -0.2.

Again:
impossible.


Logistic Regression

GLMs solve this using: logistic regression.

The model becomes:

log(p1p)=β0+β1X\log\bigg(\frac{p}{1-p}\bigg)=\beta_0+\beta_1X

This uses: log-odds.


Why Use Log-Odds?

Probabilities are constrained between:

  • 0 and 1.

But log-odds can range from:

  • negative infinity,
    to:
  • positive infinity.

That makes them suitable for linear modeling.

This is one of the elegant mathematical tricks behind logistic regression.


Example — Discounts and Sales

Suppose:β1=0.693\beta_1=0.693

Then:e0.6932e^{0.693}\approx2

Interpretation:

Discounts double the odds of sale.

This introduces another major idea in GLMs:

coefficients often become multiplicative after exponentiation.


Positive Continuous Data and Gamma Models

Not all data is:

  • counts,
  • or probabilities.

Sometimes outcomes are:

  • positive continuous variables.

Examples:

  • customer spending,
  • insurance claims,
  • waiting times,
  • service durations.

These often follow: Gamma distributions.


Gamma Variance Structure

Gamma distributions have another fascinating property:

Var(Y)=ϕμ2Var(Y)=\phi \mu^2

Variance increases with: the square of the mean.

Large outcomes fluctuate dramatically more.

Again:
ordinary regression assumptions fail.


Example — Customer Spending

Suppose:

Customer TypeAverage Spending
Small customers$100
Large customers$20,000

Large spenders fluctuate enormously more than small spenders.

This is perfectly natural for Gamma-like processes.


Gamma Regression

Gamma GLMs often use:

log(μ)=Xβ\log(\mu)=X\beta

Again:

  • positivity preserved,
  • mean-variance relationship respected.

Ordinary Regression Is Actually a GLM

One of the most beautiful realizations in statistics is this:

Ordinary regression itself is a special case of GLMs.

For ordinary regression:

  • distribution = Gaussian,
  • link function = identity,
  • variance = constant.

So ordinary regression is simply: Gaussian generalized linear modeling.

This unifies:

  • linear regression,
  • logistic regression,
  • Poisson regression,
  • Gamma regression,

under one mathematical framework.


The Three Components of a GLM

Every generalized linear model contains three pieces.


1. Random Component

The probability distribution of the outcome.

Examples:

Outcome TypeDistribution
Continuous normalGaussian
CountsPoisson
BinaryBinomial
Positive skewedGamma

2. Systematic Component

The linear predictor:η=Xβ\eta=X\beta

This combines predictors linearly.


3. Link Function

Connects:

  • the mean,
    to:
  • the linear predictor.

Examples:

DistributionLink Function
GaussianIdentity
PoissonLog
BinomialLogit
GammaLog or inverse

The Exponential Family

Many important distributions belong to: the exponential family.

Including:

  • Normal,
  • Poisson,
  • Binomial,
  • Gamma.

This shared structure allows GLMs to unify many statistical models into one framework.


A Major Misconception

A common question is:

“Why not just use ordinary regression for everything?”

Technically, you can.

But statistically:

  • assumptions become violated,
  • standard errors become wrong,
  • predictions become unrealistic,
  • and inference becomes unreliable.

GLMs are not just “ordinary regression with tweaks.” They are models designed specifically for: different data-generating mechanisms.


Real-World Applications of GLMs

GLMs are everywhere.


Medicine

  • disease prediction,
  • treatment outcomes,
  • hospital visits,
  • mortality risk.

Business Analytics

  • sales counts,
  • customer churn,
  • app engagement,
  • inventory movement.

Insurance

  • claim frequencies,
  • claim sizes,
  • risk modeling.

Manufacturing

  • defect counts,
  • waiting times,
  • production rates.

Why GLMs Matter Today

Many people assume modern machine learning replaced classical statistics.

But generalized linear models remain foundational because they are:

  • interpretable,
  • mathematically principled,
  • computationally efficient,
  • and deeply connected to probability theory.

Even many AI systems still rely on:

  • logistic outputs,
  • probabilistic modeling,
  • and generalized linear concepts.

Final Thoughts

Generalized Linear Models were revolutionary because they recognized something profound:

Different types of data behave differently.

Counts behave differently from probabilities.

Probabilities behave differently from waiting times.

Waiting times behave differently from positive continuous spending data.

Instead of forcing all data into one rigid framework, GLMs adapt the statistical model to the natural structure of the data itself.

That is why GLMs became one of the most important developments in modern statistics.


Final Summary

Generalized Linear Models (GLMs) extend ordinary regression to handle non-normal outcomes such as counts, probabilities, and positive skewed data. By combining probability distributions, linear predictors, and link functions, GLMs provide statistically principled methods for modeling real-world data where variance depends on the mean, outcomes are constrained, and ordinary regression assumptions fail.

Posted in

Leave a comment