Introduction

In the previous lesson, we studied Logistic Regression.

Logistic Regression is used when the outcome is binary:

0 or 1

Examples:

Sold vs Not Sold
Disease vs No Disease
Readmitted vs Not Readmitted

However, many business and healthcare problems involve counts.

Examples:

Healthcare

Number of hospital visits
Number of emergency admissions
Number of infections

Supply Chain

Number of sales per SKU
Number of orders
Number of returns

Retail

Number of purchases
Number of transactions
Number of customer visits

Count data behaves differently from continuous data.

A count:

can never be:

-3

2.7

Poisson Regression was specifically designed for modeling count outcomes.

When to Use Poisson Regression

Use Poisson Regression when:

Y is a count

Examples:

Outcome	Poisson?
Number of Sales	Yes
Number of Visits	Yes
Number of Claims	Yes
Revenue	No
Inventory Value	No
Length of Stay	Usually No

The Poisson Distribution

Poisson Regression assumes:

Y ~ Poisson(λ)

where:

λ

represents the expected number of events.

Example:

λ = 5

means:

Expected count = 5

Example

Suppose a SKU sells:

Month	Sales Count
Jan	4
Feb	7
Mar	5
Apr	6

The outcome is:

Number of Sales

This is a count variable.

Why Not Linear Regression?

Suppose we use:

SalesCount = β₀ + β₁ × Inventory

Predictions might become:

-2.5

3.7

These are impossible counts.

Poisson Regression avoids this issue.

The Poisson Regression Model

Poisson Regression models:

\log(\lambda)=\beta_0+\beta_1x_1+\cdots+\beta_kx_k

where:

λ

is the expected count.

Notice:

log(λ)

instead of:

λ

This guarantees:

λ > 0

which is required for counts.

Understanding the Log Link

Suppose:

β₀ = 1

Then:

log(λ) = 1

Taking the exponential:

λ = exp(1)

which equals:

2.718

Expected count:

Approximately 2.7 events

Example Dataset

Suppose we want to predict annual SKU sales.

			
import pandas as pd
data = pd.DataFrame({
    "Inventory":[
        50,
        100,
        150,
        200,
        250
    ],
    "SalesCount":[
        2,
        5,
        8,
        12,
        15
    ]
})
print(data)

		

Visualizing the Data

			
import matplotlib.pyplot as plt
plt.scatter(
    data["Inventory"],
    data["SalesCount"]
)
plt.xlabel("Inventory")
plt.ylabel("Sales Count")
plt.title(
    "Inventory vs Sales Count"
)
plt.show()

		

Question:

			
Does higher inventory
lead to more sales?

Fitting a Poisson Regression

Import Statsmodels.

import statsmodels.api as sm

Create predictors:

X = data["Inventory"]

Add intercept:

X = sm.add_constant(X)

Outcome:

y = data["SalesCount"]

Fit model:

			
model = sm.GLM(
    y,
    X,
    family=sm.families.Poisson()
).fit()
print(model.summary())

		

Understanding the Output

Focus on:

coef

and

P>|z|

These tell us:

Direction of relationship
Statistical significance

Interpreting Coefficients

Suppose coefficient for Inventory is:

0.01

Interpretation:

Inventory affects:

log(Expected Sales Count)

which is not very intuitive.

Instead we exponentiate.

Incidence Rate Ratios

Exponentiate coefficients.

			
import numpy as np
np.exp(
    model.params
)

Suppose:

			
exp(0.01)
=
1.010

Interpretation:

			
Each additional inventory unit
increases expected sales count
by 1.0%

This is much easier to understand.

Making Predictions

Predict expected sales count.

Suppose:

Inventory = 180

Prediction:

			
prediction = model.predict(
    [[1,180]]
)
print(prediction)

Output:

9.8

Interpretation:

Expected sales count ≈ 10

Healthcare Example

Suppose we study emergency room visits.

Dataset:

			
patients = pd.DataFrame({
    "Age":[
        30,
        40,
        50,
        60,
        70
    ],
    "Visits":[
        1,
        2,
        3,
        5,
        7
    ]
})

		

Fit model:

			
X = sm.add_constant(
    patients["Age"]
)
y = patients["Visits"]
model = sm.GLM(
    y,
    X,
    family=sm.families.Poisson()
).fit()
print(model.summary())

		

Question:

			
Does age increase
emergency visits?

Supply Chain Example

Suppose we model annual sales count.

			
inventory = pd.DataFrame({
    "Inventory":[
        50,
        100,
        150,
        200,
        250
    ],
    "SalesCount":[
        2,
        5,
        8,
        12,
        15
    ]
})

		

Fit model:

			
X = sm.add_constant(
    inventory["Inventory"]
)
y = inventory["SalesCount"]
model = sm.GLM(
    y,
    X,
    family=sm.families.Poisson()
).fit()
print(model.summary())

		

Question:

			
How does inventory
affect sales count?

Exposure Variables

Sometimes observations have different exposure periods.

Example:

Store	Sales	Days Open
A	100	365
B	50	180

Store B had less time to generate sales.

Poisson models can account for exposure.

Example

			
model = sm.GLM(
    y,
    X,
    family=sm.families.Poisson(),
    offset=np.log(exposure)
).fit()

		

This is common in:

Insurance
Healthcare
Operations research

A Key Assumption

Poisson Regression assumes:

Mean = Variance

This assumption is often violated.

Example:

			
Mean Sales Count = 5
Variance = 40

This is called:

Overdispersion

and is extremely common.

Checking for Overdispersion

Calculate:

			
mean_count = y.mean()
variance_count = y.var()
print(mean_count)
print(variance_count)

If:

Variance >> Mean

Poisson Regression may not be appropriate.

Why Overdispersion Matters

Suppose:

			
Mean = 5
Variance = 50

Poisson assumes:

Variance = 5

The model underestimates uncertainty.

This leads to:

Incorrect p-values
Overconfidence
Misleading conclusions

Real-World Example

SKU sales often look like:

Notice:

Many zeros
Large variation

Poisson frequently struggles here.

Enter Negative Binomial Regression

Negative Binomial Regression extends Poisson by allowing:

Variance > Mean

This is one of the most important count models in:

Retail analytics
Healthcare analytics
Insurance
Supply chain forecasting

We will study it in the next lesson.

Analyst Workflow

When modeling counts:

Visualize:

			
plt.hist(
    y,
    bins=20
)
plt.show()

		

Check:

			
print(
    y.mean()
)
print(
    y.var()
)

		

Fit model:

			
model = sm.GLM(
    y,
    X,
    family=sm.families.Poisson()
).fit()

		

Interpret:

			
np.exp(
    model.params
)

Evaluate:

Coefficients
p-values
Overdispersion

Healthcare Exercise

Predict:

Number of Hospital Visits

using:

			
Age
BMI
Smoking Status

Questions:

Which variables increase visit frequency?
Which variables are statistically significant?

Supply Chain Exercise

Predict:

Annual SKU Sales Count

using:

			
Inventory
Price
Customer Turn
Deployment Value

Questions:

Which factors drive sales frequency?
What is the expected sales count?

Lesson Summary

In this lesson we learned:

When to use Poisson Regression
The Poisson distribution
Count data modeling
Log-link functions
Incidence rate ratios
Exposure variables
Predictions
Overdispersion

Poisson Regression is the standard starting point for count data, but real-world count data often exhibits overdispersion.

In the next lesson we will study Negative Binomial Regression, one of the most important models for SKU demand, healthcare utilization, claims frequency, and other real-world count outcomes.

nerd-ish

Leave a ReplyCancel reply

Lesson 13: Gradient Boosting and XGBoost: Building State-of-the-Art Predictive Models

Algebraic Geometry: The Geometry Hidden Inside Polynomial Equations

What Is a Transcendental Number?

Lesson 9: Poisson Regression: Modeling Counts and Event Frequencies

Introduction

Healthcare

Supply Chain

Retail

When to Use Poisson Regression

The Poisson Distribution

Example

Why Not Linear Regression?

The Poisson Regression Model

Understanding the Log Link

Example Dataset

Visualizing the Data

Fitting a Poisson Regression

Understanding the Output

Interpreting Coefficients

Incidence Rate Ratios

Making Predictions

Healthcare Example

Supply Chain Example

Exposure Variables

Example

A Key Assumption

Checking for Overdispersion

Why Overdispersion Matters

Real-World Example

Enter Negative Binomial Regression

Analyst Workflow

Healthcare Exercise

Supply Chain Exercise

Lesson Summary

Share this:

Like this:

Related posts:

Leave a ReplyCancel reply

Discover more from nerd-ish