Think Deeply. Learn Clearly.

[
[
[

]
]
]

By Chapter 5, something interesting happens.

Until now, we modeled:

  • continuous outcomes,
  • count outcomes,
  • probabilities.

But what if there is no obvious “response variable”? What if we simply want to understand: Are variables related?

Examples:

  • Does shape affect sales?
  • Does color affect purchase?
  • Does clarity influence conversion?
  • Does region affect product preference?

Now we move into: Loglinear Models

One of the most underrated parts of GLMs.


The Main Question of Chapter 5

Suppose we observe:

Round sold 120,
Ovals sold 60.

Question: Is shape related to sales? Or: Are they independent?

That’s what loglinear models solve.


What Type of Data Are We Modeling?

Counts.

But not ordinary counts.

We model: counts inside categories.

Example:

ShapeSoldNot Sold
Round12080
Oval60140

Each cell contains: a count.


Why Ordinary Regression Does Not Work

Suppose we predict:

Sales=β0+β1ShapeSales=\beta_0+\beta_1Shape

Problem:

These are:

  • frequencies,
  • discrete counts.

Also:

  • expected values must remain positive.

So ordinary regression becomes awkward.

Instead:

Loglinear models assume:

YPoissonY\sim Poisson

for cell counts.


The Core Loglinear Model

We model:

log(μij)=λ+λiA+λjB\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B

where:

  • μij=expectedcount,\mu_{ij} = expected count,
  • λ=overallmean,\lambda = overall mean,
  • λiA=effectoffactorA,\lambda_i^A = effect of factor A,
  • λjB=effectoffactorB.\lambda_j^B = effect of factor B.

At first this looks scary.

But the idea is simple.

We model: expected cell counts.


Example — Diamond Shape × Sales

Suppose:

ShapeCount
Round100
Oval60
Cushion40

Then:

log(μ)=λ+λShape\log(\mu) = \lambda + \lambda_{\text{Shape}}

The model estimates: expected counts for each category.


Independence — The Most Important Idea

Suppose we want to know:

Does shape affect whether something sells?

If independent: Expected counts become:

Expected=Row×ColumnTotal\text{Expected} = \frac{\text{Row} \times \text{Column}}{\text{Total}}
ShapeSoldNot SoldTotal
Round6040100
Oval2080100
Total80120200

Expected Round Sold:

100×80/200=40100\times80/200=40

Observed: 60

Expected 40.

    Difference suggests: dependence.


    Independence Model

    Model:

    log(μij)=λ+λiA+λjB\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B

    Interpretation:

    • Shape influences counts.
    • Sales influences counts.
    • But no interaction.

    Interaction — The Heart of Chapter 5

    Now suppose shape changes sales.

    Add interaction:

    log(μij)=λ+λiA+λjB+λijAB\log(\mu_{ij})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_{ij}^{AB}

    This term means: the effect of one variable depends on another.


    Numerical Example

    Observed:

    ShapeSoldNot Sold
    Round9010
    Oval1090

    Expected under independence:

    ShapeSoldNot Sold
    Round5050
    Oval5050

    Huge mismatch.

    Interaction becomes necessary.


    Three-Way Tables

    Chapter 5 becomes powerful when adding more variables.

    Example:

    | Shape | Color | Sold |

    Now model:

    log(μijk)=λ+λiA+λjB+λkC\log(\mu_{ijk})=\lambda+\lambda_i^A+\lambda_j^B+\lambda_k^C

    Possible interactions:

    • Shape × Color
    • Shape × Sale
    • Color × Sale
    • Shape × Color × Sale

    Interpretation of Interaction

    Suppose:

    Round diamonds:

    • sell well in DEF.

    Oval:

    • sell well in GHI.

    Now:

    Shape effect changes by color. That becomes interaction.


    Hierarchical Principle

    One of the most important ideas.

    If you include:

    Shape × Color

    you must include:

    • Shape
    • Color

    Do not include interaction alone.


    Likelihood Ratio Tests

    How do we know if interaction matters?

    Compare:

    Model 1: independence.

    Model 2: interaction.

    Statistic:

    G2=2Olog(OE)G^2 = 2\sum O \log\left(\frac{O}{E}\right)

    where:

    • O = observed,
    • E = expected.

    Large values: interaction exists.


    Pearson Chi-Square

    Alternative measure:

    χ2=(OE)2E\chi^2=\sum\frac{(O-E)^2}{E}

    Measures:

    difference between:

    • observed,
    • expected.

    Large values:
    → poor fit.


    Connection to Logistic Regression

    Interesting fact:

    Logistic regression is actually connected to loglinear models.

    Example:

    If outcome is fixed:

    | Shape | Sold |

    Logistic and loglinear models often become equivalent.

    That’s why these chapters sit next to each other.


    Real Dialog Applications

    This chapter fits your work surprisingly well.

    Examples:


    Shape × Weight × Sold

    Question:

    Which combinations perform best?


    Director × Customer Segment × Sale

    Question:

    Are some directors better in certain segments?


    Deployment × Region × Conversion

    Question:

    Does deployment work differently by region?


    Customer Type × Product Type × Churn

    Question:

    Are some combinations unstable?


    Inventory Example

    Suppose:

    CategorySold
    LGRD 0.23–0.2720
    LGRD 0.28–0.3240

    Expected: 30 for both.

      Observed differs.

      Interaction may exist.


      Why Chapter 5 Matters

      Before Chapter 5: you modeled outcomes.

      After Chapter 5: you start modeling: relationships between categories.

      That is a major shift.


      The Deep Lesson

      Chapter 5 teaches: counts are not enough.

      What matters is: how categories combine.

      And interactions often explain reality better than averages.


      Final Thought

      Loglinear models are easy to overlook.

      But they quietly power:

      • market basket analysis,
      • contingency analysis,
      • segmentation,
      • categorical analytics,
      • association discovery,
      • business intelligence.

      And once you learn them,

      you stop asking:

      “How many?”

      and start asking:

      “What combinations matter?”

      Leave a comment