In the previous chapter, we learned something powerful: not all outcomes behave like continuous normal data.
Some outcomes are counts.
Some are positive skewed values.
But there is another extremely common type of outcome: “decisions.”
Questions like:
- Will this customer buy?
- Will this retailer churn?
- Will this diamond sell?
- Will this campaign succeed?
- Should we replenish this SKU?
These are not count problems.
These are: yes/no problems.
And ordinary regression completely breaks here.
Chapter 4 introduces one of the most important models in statistics: Logistic Regression.
Why Ordinary Regression Breaks for Binary Data
Suppose we try to predict:
Ordinary regression would do:
But immediately problems appear.
Predictions might become:
- −0.4 probability
- 1.7 probability
- 2.3 probability
Impossible. Probabilities must stay between:
So ordinary regression breaks.
Bernoulli Distribution — The Foundation
Logistic regression assumes:
Meaning
1 is success, 0 is failure
Examples:
- customer buys,
- retailer leaves,
- diamond sells.
Mean and Variance
Mean:
Variance:
Notice:
Variance depends on the mean. That already violates ordinary regression.
Why Variance Peaks at 50%
Look carefully:
If p=0, Variance: 0
If: p=1, Variance: 0
Maximum occurs at: p=0.5
Why?
Because uncertainty is highest.
At 50%:
- both outcomes are equally possible.
The Problem With Modeling Probability Directly
Suppose:
Eventually:
- probabilities exceed 1,
- probabilities become negative.
We need a transformation.
Enter Odds
Odds answer: how much more likely success is than failure.
Formula:
Example:
Suppose: p=0.75
Then:
Meaning: success is three times as likely. This explains sports betting language: “3 to 1 odds.”
But Odds Still Have a Problem
Odds range:
Regression prefers:
So we take logs.
The Logit Function
The canonical link:
This is called: Logistic Regression
Left side:
- transformed probability.
Right side:
- regression.
Recovering Probability
After solving:
we convert back:
This is the logistic function.
Now probabilities always stay:
Beautiful.
Interpretation of Coefficients
This is where students usually get confused.
Suppose:
Exponentiate:
Interpretation: odds double. Not probability. Odds.
Very important distinction.
Business Example — Diamond Wholesale provider
Suppose:
Outcome:
Predictors:
- Days Out
- Discount
- Customer Count
- Shape
Model:
Suppose: Discount coefficient:
Then:
Interpretation: discount increases odds of sale by 65%.
Classification vs Probability
People often think logistic regression predicts:
Yes / No.
Not exactly.
It predicts: P(Y=1) Then we choose cutoff.
Example:
If Probability= 0.82, Decision= Yes and when probability is 0.31, No. Because commonly used cutoff is 0.5
Overdispersion
Bernoulli assumes:
But reality may show: more variability. Meaning: observed variance > expected.
This suggests:
- hidden groups,
- clustering,
- omitted variables.
Retrospective Sampling
Sometimes data is collected by outcome.
Example:
Start with:
- 100 buyers,
- 100 non-buyers.
Then study predictors. This is: case-control sampling.
Surprisingly: logistic regression still works. Because: odds ratios remain identifiable.
Chapter 4’s Big Lesson
Chapter 4 teaches: probabilities cannot be treated like ordinary numbers.
Binary outcomes have:
- boundaries,
- variance behavior,
- nonlinear structure.
Logistic regression solves all of this.
Final Thought
Logistic regression looks simple:
0 or 1
But underneath it lies:
- Bernoulli distributions,
- odds,
- log transformations,
- nonlinear probability geometry,
- likelihood theory.
And once you understand logistic regression, you stop asking: “Will it happen?” and start asking: “How likely is it?”

Leave a comment