Chapter 5 — Loglinear Models: Understanding Relationships Between Categories

pj316

3–5 minutes

courses, GLM, Statistics

By Chapter 5, something interesting happens.

Until now, we modeled:

continuous outcomes,
count outcomes,
probabilities.

But what if there is no obvious “response variable”? What if we simply want to understand: Are variables related?

Examples:

Does shape affect sales?
Does color affect purchase?
Does clarity influence conversion?
Does region affect product preference?

Now we move into: Loglinear Models

One of the most underrated parts of GLMs.

The Main Question of Chapter 5

Suppose we observe:

Round sold 120,
Ovals sold 60.

Question: Is shape related to sales? Or: Are they independent?

That’s what loglinear models solve.

What Type of Data Are We Modeling?

Counts.

But not ordinary counts.

We model: counts inside categories.

Example:

Shape	Sold	Not Sold
Round	120	80
Oval	60	140

Each cell contains: a count.

Why Ordinary Regression Does Not Work

Suppose we predict:

Sales=\beta_0+\beta_1Shape

Problem:

These are:

frequencies,
discrete counts.

Also:

expected values must remain positive.

So ordinary regression becomes awkward.

Instead:

Loglinear models assume:

Y\sim Poisson

for cell counts.

The Core Loglinear Model

We model:

\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B

where:

$\mu_{ij} = expected count,$
$\lambda = overall mean,$
$\lambda_i^A = effect of factor A,$
$\lambda_j^B = effect of factor B.$

At first this looks scary.

But the idea is simple.

We model: expected cell counts.

Example — Diamond Shape × Sales

Suppose:

Shape	Count
Round	100
Oval	60
Cushion	40

Then:

\log(\mu) = \lambda + \lambda_{\text{Shape}}

The model estimates: expected counts for each category.

Independence — The Most Important Idea

Suppose we want to know:

Does shape affect whether something sells?

If independent: Expected counts become:

\text{Expected} = \frac{\text{Row} \times \text{Column}}{\text{Total}}

Shape	Sold	Not Sold	Total
Round	60	40	100
Oval	20	80	100
Total	80	120	200

Expected Round Sold:

100\times80/200=40

Observed: 60

Expected 40.

Difference suggests: dependence.

Independence Model

Model:

\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B

Interpretation:

Shape influences counts.
Sales influences counts.
But no interaction.

Interaction — The Heart of Chapter 5

Now suppose shape changes sales.

Add interaction:

\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB}

This term means: the effect of one variable depends on another.

Numerical Example

Observed:

Shape	Sold	Not Sold
Round	90	10
Oval	10	90

Expected under independence:

Shape	Sold	Not Sold
Round	50	50
Oval	50	50

Huge mismatch.

Interaction becomes necessary.

Three-Way Tables

Chapter 5 becomes powerful when adding more variables.

Example:

| Shape | Color | Sold |

Now model:

\log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C

Possible interactions:

Shape × Color
Shape × Sale
Color × Sale
Shape × Color × Sale

Interpretation of Interaction

Suppose:

Round diamonds:

sell well in DEF.

Oval:

sell well in GHI.

Now:

Shape effect changes by color. That becomes interaction.

Hierarchical Principle

One of the most important ideas.

If you include:

Shape × Color

you must include:

Shape
Color

Do not include interaction alone.

Likelihood Ratio Tests

How do we know if interaction matters?

Compare:

Model 1: independence.

Model 2: interaction.

Statistic:

G^2 = 2\sum O \log\left(\frac{O}{E}\right)

where:

O = observed,
E = expected.

Large values: interaction exists.

Pearson Chi-Square

Alternative measure:

\chi^2=\sum\frac{(O-E)^2}{E}

Measures:

difference between:

observed,
expected.

Large values:
→ poor fit.

Connection to Logistic Regression

Interesting fact:

Logistic regression is actually connected to loglinear models.

Example:

If outcome is fixed:

| Shape | Sold |

Logistic and loglinear models often become equivalent.

That’s why these chapters sit next to each other.

Shape × Weight × Sold

Question:

Which combinations perform best?

Director × Customer Segment × Sale

Question:

Are some directors better in certain segments?

Deployment × Region × Conversion

Question:

Does deployment work differently by region?

Customer Type × Product Type × Churn

Question:

Are some combinations unstable?

Inventory Example

Suppose:

Category	Sold
LGRD 0.23–0.27	20
LGRD 0.28–0.32	40

Expected: 30 for both.

Observed differs.

Interaction may exist.

Why Chapter 5 Matters

Before Chapter 5: you modeled outcomes.

After Chapter 5: you start modeling: relationships between categories.

That is a major shift.

The Deep Lesson

Chapter 5 teaches: counts are not enough.

What matters is: how categories combine.

And interactions often explain reality better than averages.

Final Thought

Loglinear models are easy to overlook.

But they quietly power:

market basket analysis,
contingency analysis,
segmentation,
categorical analytics,
association discovery,
business intelligence.

And once you learn them,

you stop asking:

“How many?”

and start asking:

“What combinations matter?”

nerd-ish

Leave a ReplyCancel reply

Lesson 11: The Dominated Convergence Theorem

Lesson 10: The Monotone Convergence Theorem

Lesson 7: Integration Before Probability

Chapter 5 — Loglinear Models: Understanding Relationships Between Categories

The Main Question of Chapter 5

What Type of Data Are We Modeling?

Why Ordinary Regression Does Not Work

The Core Loglinear Model

Example — Diamond Shape × Sales

Independence — The Most Important Idea

Independence Model

Interaction — The Heart of Chapter 5

Numerical Example

Three-Way Tables

Interpretation of Interaction

Hierarchical Principle

Likelihood Ratio Tests

Pearson Chi-Square

Connection to Logistic Regression

Shape × Weight × Sold

Director × Customer Segment × Sale

Deployment × Region × Conversion

Customer Type × Product Type × Churn

Inventory Example

Why Chapter 5 Matters

The Deep Lesson

Final Thought

Share this:

Like this:

Related posts:

Leave a ReplyCancel reply

Discover more from nerd-ish