Chapter 7 — Conditional Likelihood: How Statistics Removes What You Don’t Care About

By Chapter 7, statistics takes a fascinating turn. Until now we’ve mostly asked:

How do predictors affect outcomes?

But Chapter 7 asks a different question:

What if there are factors affecting outcomes that we do NOT want to estimate?

These unwanted factors are called: nuisance parameters.

And Chapter 7 introduces one of the cleverest ideas in statistics: Conditional Likelihood.

This chapter feels difficult initially.

But once the intuition clicks, it becomes one of the most elegant ideas in modeling.


The Problem: Hidden Baseline Differences

Suppose we want to study: Does exposure increase disease?

You collect data.

But each pair of people differs:

  • genetics,
  • age,
  • income,
  • baseline health.

Those baseline differences interfere.

Similarly in business:

Suppose you want to know:

Does deployment increase sales?

But retailers already differ:

  • location,
  • management,
  • reputation,
  • market size.

How do we isolate the effect we care about?


The Main Idea of Conditional Likelihood

Conditional likelihood says: Instead of modeling everything, compare subjects under similar conditions.

Remove what you don’t care about.

Estimate only what matters.


Matched Pair Example

Suppose:

PairCaseControl
1ExposedNot exposed
2Not exposedExposed

Case:

  • outcome happened.

Control:

  • outcome did not happen.

What Does “Case Exposed, Control Not Exposed” Mean?

Suppose:

Pair 1:

PersonDiseaseExposure
AYesYes
BNoNo

Interpretation:

The person with disease was exposed. That pair supports: exposure increases disease.


Another Pair

PersonDiseaseExposure
AYesNo
BNoYes

Now evidence goes opposite direction.


Why Matching?

Matching removes baseline risk.

RetailerDeploymentSale
AHighSold
BLowNot Sold

But what if:

  • retailer A is naturally stronger?

Matching attempts to compare:

  • similar retailers.

The Surprising Result

After conditioning: baseline disappears. Only relative information remains.

This is the magic.


Conditional Logistic Regression

Ordinary logistic:

log(p1p)=β0+βX\log\left(\frac{p}{1-p}\right)=\beta_0+\beta X

Conditional logistic:

removes:

β0\beta_0

You estimate only:

β\beta

Why Does Baseline Cancel?

This was one of your earlier questions.

Suppose:

Pair-specific model:

log(pij1pij)=αi+βXij\log\left(\frac{p_{ij}}{1-p_{ij}}\right) = \alpha_i + \beta X_{ij}

where:

  • αi\alpha_i = pair baseline.

Conditioning mathematically removes:

αi\alpha_i

Now only:

β\beta

remains.


The Key Insight

You stop asking:

“Who has higher baseline risk?”

Instead ask:

“Within similar pairs, what changed?”


Concordant vs Discordant Pairs

This is the most important concept.


CaseControl
ExposedExposed

Discordant:

CaseControl
ExposedNot Exposed

Very informative.


Why?

Only discordant pairs tell us:

  • which exposure won.

Numerical Example

Suppose:

PairCase ExposedControl Exposed
1YesNo
2YesNo
3NoYes

Estimate:

Odds ratio:

OR=21OR=\frac21

Interpretation:

Exposure approximately doubles odds.


Conditional Likelihood Formula

Suppose:

Pair:

Exposure values:

x1,x2x_1,x_2

Conditional probability:

P(case selected)=eβx1eβx1+eβx2P(\text{case selected}) = \frac{e^{\beta x_1}} {e^{\beta x_1}+e^{\beta x_2}}

Notice: baseline disappeared.

Only exposure remains.


Why This Is Beautiful

Because we never estimated:

  • intercept,
  • pair risk,
  • hidden baseline.

Statistics removed them.


Marginal vs Conditional Likelihood

This confused a lot of people.


Marginal

Average across everyone.

Integrate nuisance away.


Conditional

Condition on fixed quantities.

Cancel nuisance.


Example:

Retailers:

Marginal:
overall average effect.

Conditional:
within-retailer effect.


Hypergeometric Connection

This often appears suddenly.

Why? Because after conditioning: counts become fixed.

Probability becomes: sampling without replacement.

That creates: hypergeometric distributions.


Real Business Example

Suppose: Question:

Does deployment improve sales?

Retailers differ massively.

Match retailers by:

  • size,
  • region,
  • customer count.

Then compare: higher deployment vs lower deployment. Conditional analysis removes retailer baseline.

Very powerful.


Why This Chapter Feels Hard

Because for the first time: statistics stops estimating everything.

Instead it says: some information is unnecessary.

That feels strange initially. But it is powerful.


Inventory Example

Suppose:

You want to know: Does replenishment improve sales?

Different categories behave differently. Match categories.

Condition away category baseline.

Estimate only replenishment effect.


Chapter 7’s Big Lesson

This chapter teaches: not every parameter deserves estimation.

Some factors should be removed.

And conditioning gives cleaner inference.


Final Thought

Before Chapter 7: you ask:

“How do I estimate everything?”

After Chapter 7: you ask:

“What can I safely eliminate?”

That shift changes how advanced statistical models work.

Conditional likelihood becomes the bridge into:

  • mixed models,
  • Bayesian methods,
  • Cox models,
  • survival analysis,
  • modern inference.

Leave a comment