Chapter 4 — Logistic Regression: Modeling Decisions, Probabilities, and Yes/No Outcomes

pj316

3–4 minutes

courses, GLM, Statistics

In the previous chapter, we learned something powerful: not all outcomes behave like continuous normal data.

Some outcomes are counts.
Some are positive skewed values.

But there is another extremely common type of outcome: “decisions.”

Questions like:

Will this customer buy?
Will this retailer churn?
Will this diamond sell?
Will this campaign succeed?
Should we replenish this SKU?

These are not count problems.

These are: yes/no problems.

And ordinary regression completely breaks here.

Chapter 4 introduces one of the most important models in statistics: Logistic Regression.

Why Ordinary Regression Breaks for Binary Data

Suppose we try to predict:

Y = \begin{cases} 1 & \text{customer buys} \\ 0 & \text{customer does not buy} \end{cases}

Ordinary regression would do:

Y=\beta_0+\beta_1X+\epsilon

But immediately problems appear.

Predictions might become:

−0.4 probability
1.7 probability
2.3 probability

Impossible. Probabilities must stay between:

0\le p\le1

So ordinary regression breaks.

Bernoulli Distribution — The Foundation

Logistic regression assumes:

Y\sim Bernoulli(p)

Meaning

1 is success, 0 is failure

Examples:

customer buys,
retailer leaves,
diamond sells.

Mean and Variance

Mean:

E(Y)=p

Variance:

Var(Y)=p(1-p)

Notice:

Variance depends on the mean. That already violates ordinary regression.

Why Variance Peaks at 50%

Look carefully:

p(1-p)

If p=0, Variance: 0
If: p=1, Variance: 0

Maximum occurs at: p=0.5

Why?

Because uncertainty is highest.

At 50%:

both outcomes are equally possible.

The Problem With Modeling Probability Directly

Suppose:

p=\beta_0+\beta_1X

Eventually:

probabilities exceed 1,
probabilities become negative.

We need a transformation.

Enter Odds

Odds answer: how much more likely success is than failure.

Formula:

Odds=\frac{p}{1-p}

Example:

Suppose: p=0.75

Then:

Odds=\frac{0.75}{0.25}=3

Meaning: success is three times as likely. This explains sports betting language: “3 to 1 odds.”

But Odds Still Have a Problem

Odds range: $0\rightarrow\infty$

Regression prefers: $-\infty\rightarrow\infty$

So we take logs.

The Logit Function

The canonical link:

\log\left(\frac{p}{1-p}\right)=X\beta

This is called: Logistic Regression

Left side:

transformed probability.

Right side:

regression.

Recovering Probability

After solving: $X\beta$

we convert back:

p=\frac{e^{X\beta}}{1+e^{X\beta}}

This is the logistic function.

Now probabilities always stay: $0\le p\le1$

Beautiful.

Interpretation of Coefficients

This is where students usually get confused.

Suppose: $\beta=0.7$

Exponentiate:

e^{0.7}=2.01

Interpretation: odds double. Not probability. Odds.

Very important distinction.

Business Example — Diamond Wholesale provider

Suppose:

Outcome:

Y = \begin{cases} 1 & \text{diamond sold} \\ 0 & \text{not sold} \end{cases}

Predictors:

Days Out
Discount
Customer Count
Shape

Model:

\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \cdot \text{DaysOut} + \beta_2 \cdot \text{Discount}

Suppose: Discount coefficient:

\beta=0.5

Then:

e^{0.5}=1.65

Interpretation: discount increases odds of sale by 65%.

Classification vs Probability

People often think logistic regression predicts:

Yes / No.

Not exactly.

It predicts: P(Y=1) Then we choose cutoff.

Example:

If Probability= 0.82, Decision= Yes and when probability is 0.31, No. Because commonly used cutoff is 0.5

Overdispersion

Bernoulli assumes:

Var(Y)=p(1-p)

But reality may show: more variability. Meaning: observed variance > expected.

This suggests:

hidden groups,
clustering,
omitted variables.

Retrospective Sampling

Sometimes data is collected by outcome.

Example:

Start with:

100 buyers,
100 non-buyers.

Then study predictors. This is: case-control sampling.

Surprisingly: logistic regression still works. Because: odds ratios remain identifiable.

Chapter 4’s Big Lesson

Chapter 4 teaches: probabilities cannot be treated like ordinary numbers.

Binary outcomes have:

boundaries,
variance behavior,
nonlinear structure.

Logistic regression solves all of this.

Final Thought

Logistic regression looks simple:

0 or 1

But underneath it lies:

Bernoulli distributions,
odds,
log transformations,
nonlinear probability geometry,
likelihood theory.

And once you understand logistic regression, you stop asking: “Will it happen?” and start asking: “How likely is it?”

nerd-ish

Leave a ReplyCancel reply

Lesson 11: The Dominated Convergence Theorem

Lesson 10: The Monotone Convergence Theorem

Lesson 7: Integration Before Probability

Chapter 4 — Logistic Regression: Modeling Decisions, Probabilities, and Yes/No Outcomes

Why Ordinary Regression Breaks for Binary Data

Bernoulli Distribution — The Foundation

Mean and Variance

Why Variance Peaks at 50%

The Problem With Modeling Probability Directly

Enter Odds

But Odds Still Have a Problem

The Logit Function

Recovering Probability

Interpretation of Coefficients

Business Example — Diamond Wholesale provider

Classification vs Probability

Overdispersion

Retrospective Sampling

Chapter 4’s Big Lesson

Final Thought

Share this:

Like this:

Related posts:

Leave a ReplyCancel reply

Discover more from nerd-ish