Chapter 4 — Logistic Regression: Modeling Decisions, Probabilities, and Yes/No Outcomes

In the previous chapter, we learned something powerful: not all outcomes behave like continuous normal data.

Some outcomes are counts.
Some are positive skewed values.

But there is another extremely common type of outcome: “decisions.”

Questions like:

  • Will this customer buy?
  • Will this retailer churn?
  • Will this diamond sell?
  • Will this campaign succeed?
  • Should we replenish this SKU?

These are not count problems.

These are: yes/no problems.

And ordinary regression completely breaks here.

Chapter 4 introduces one of the most important models in statistics: Logistic Regression.


Why Ordinary Regression Breaks for Binary Data

Suppose we try to predict:

Y={1customer buys0customer does not buyY = \begin{cases} 1 & \text{customer buys} \\ 0 & \text{customer does not buy} \end{cases}

Ordinary regression would do:

Y=β0+β1X+ϵY=\beta_0+\beta_1X+\epsilon

But immediately problems appear.

Predictions might become:

  • −0.4 probability
  • 1.7 probability
  • 2.3 probability

Impossible. Probabilities must stay between:

0p10\le p\le1

So ordinary regression breaks.


Bernoulli Distribution — The Foundation

Logistic regression assumes:

YBernoulli(p)Y\sim Bernoulli(p)

Meaning

1 is success, 0 is failure

Examples:

  • customer buys,
  • retailer leaves,
  • diamond sells.

Mean and Variance

Mean:

E(Y)=pE(Y)=p

Variance:

Var(Y)=p(1p)Var(Y)=p(1-p)

Notice:

Variance depends on the mean. That already violates ordinary regression.


Why Variance Peaks at 50%

Look carefully:

p(1p)p(1-p)

If p=0, Variance: 0
If: p=1, Variance: 0

Maximum occurs at: p=0.5

Why?

Because uncertainty is highest.

At 50%:

  • both outcomes are equally possible.

The Problem With Modeling Probability Directly

Suppose:

p=β0+β1Xp=\beta_0+\beta_1X

Eventually:

  • probabilities exceed 1,
  • probabilities become negative.

We need a transformation.


Enter Odds

Odds answer: how much more likely success is than failure.

Formula:

Odds=p1pOdds=\frac{p}{1-p}

Example:

Suppose: p=0.75

Then:

Odds=0.750.25=3Odds=\frac{0.75}{0.25}=3

Meaning: success is three times as likely. This explains sports betting language: “3 to 1 odds.”


But Odds Still Have a Problem

Odds range: 00\rightarrow\infty

Regression prefers: -\infty\rightarrow\infty

So we take logs.


The Logit Function

The canonical link:

log(p1p)=Xβ\log\left(\frac{p}{1-p}\right)=X\beta

This is called: Logistic Regression

Left side:

  • transformed probability.

Right side:

  • regression.

Recovering Probability

After solving: XβX\beta

we convert back:

p=eXβ1+eXβp=\frac{e^{X\beta}}{1+e^{X\beta}}

This is the logistic function.

Now probabilities always stay: 0p10\le p\le1

Beautiful.


Interpretation of Coefficients

This is where students usually get confused.

Suppose: β=0.7\beta=0.7

Exponentiate:

e0.7=2.01e^{0.7}=2.01

Interpretation: odds double. Not probability. Odds.

Very important distinction.


Business Example — Diamond Wholesale provider

Suppose:

Outcome:

Y={1diamond sold0not soldY = \begin{cases} 1 & \text{diamond sold} \\ 0 & \text{not sold} \end{cases}

Predictors:

  • Days Out
  • Discount
  • Customer Count
  • Shape

Model:

log(p1p)=β0+β1DaysOut+β2Discount\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \cdot \text{DaysOut} + \beta_2 \cdot \text{Discount}

Suppose: Discount coefficient:

β=0.5\beta=0.5

Then:

e0.5=1.65e^{0.5}=1.65

Interpretation: discount increases odds of sale by 65%.


Classification vs Probability

People often think logistic regression predicts:

Yes / No.

Not exactly.

It predicts: P(Y=1) Then we choose cutoff.

Example:

If Probability= 0.82, Decision= Yes and when probability is 0.31, No. Because commonly used cutoff is 0.5


Overdispersion

Bernoulli assumes:

Var(Y)=p(1p)Var(Y)=p(1-p)

But reality may show: more variability. Meaning: observed variance > expected.

This suggests:

  • hidden groups,
  • clustering,
  • omitted variables.

Retrospective Sampling

Sometimes data is collected by outcome.

Example:

Start with:

  • 100 buyers,
  • 100 non-buyers.

Then study predictors. This is: case-control sampling.

Surprisingly: logistic regression still works. Because: odds ratios remain identifiable.


Chapter 4’s Big Lesson

Chapter 4 teaches: probabilities cannot be treated like ordinary numbers.

Binary outcomes have:

  • boundaries,
  • variance behavior,
  • nonlinear structure.

Logistic regression solves all of this.


Final Thought

Logistic regression looks simple:

0 or 1

But underneath it lies:

  • Bernoulli distributions,
  • odds,
  • log transformations,
  • nonlinear probability geometry,
  • likelihood theory.

And once you understand logistic regression, you stop asking: “Will it happen?” and start asking: “How likely is it?”

Posted in

Leave a comment