Lesson 5: Data Visualization: Turning Data into Insights

Introduction

Data analysis is not only about calculating numbers.

A great analyst must also communicate findings clearly.

Suppose you calculate that:

  • Average hospital stay is 5.4 days.
  • Average inventory turn is 0.72.
  • Sales increased by 18%.

Those numbers are useful.

However, often a simple chart communicates the message far more effectively.

Visualization allows us to:

  • Discover patterns
  • Identify outliers
  • Detect trends
  • Compare groups
  • Communicate results

This lesson introduces the most important charts used by data analysts.

We will use:

import matplotlib.pyplot as plt
import seaborn as sns

Matplotlib is the foundation.

Seaborn builds on top of Matplotlib and provides better statistical graphics.


Loading Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Creating a Sample Dataset

sales = pd.DataFrame({
"Month": [
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun"
],
"Sales": [
100,
120,
140,
130,
170,
200
]
})

Line Charts

Line charts are used for trends over time.

Examples:

  • Monthly sales
  • Daily hospital admissions
  • Inventory levels
  • Revenue growth

Creating a Line Chart

plt.plot(
sales["Month"],
sales["Sales"]
)
plt.title("Monthly Sales")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.show()

The x-axis represents time.

The y-axis represents the variable being measured.


Why Analysts Love Line Charts

Line charts quickly reveal:

  • Upward trends
  • Downward trends
  • Seasonal patterns
  • Sudden changes

For example:

  • Sales growth
  • Patient admissions
  • Inventory depletion

Bar Charts

Bar charts compare categories.

Examples:

  • Sales by customer
  • Revenue by region
  • Patients by diagnosis

Example

customers = pd.DataFrame({
"Customer": [
"Alpha",
"Beta",
"Gamma"
],
"Sales": [
250,
400,
300
]
})

Create a bar chart:

plt.bar(
customers["Customer"],
customers["Sales"]
)
plt.title("Sales by Customer")
plt.xlabel("Customer")
plt.ylabel("Sales")
plt.show()

Horizontal Bar Charts

Often easier to read.

plt.barh(
customers["Customer"],
customers["Sales"]
)
plt.title("Sales by Customer")
plt.show()

These are especially useful when category names are long.


Histograms

Histograms show distributions.

One of the most important charts in statistics.

Questions answered:

  • Are values normally distributed?
  • Is the data skewed?
  • Are there multiple peaks?
  • Are there extreme observations?

Example

import numpy as np
sales = pd.DataFrame({
"Sales":
np.random.normal(
100,
15,
1000
)
})

Plot histogram:

plt.hist(
sales["Sales"],
bins=20
)
plt.title(
"Distribution of Sales"
)
plt.xlabel("Sales")
plt.ylabel("Frequency")
plt.show()

Understanding Histograms

Suppose most sales cluster around 100.

The histogram reveals:

  • Center
  • Spread
  • Shape

Before building statistical models, always examine the distribution.


Density Plots

Density plots provide a smoother version of a histogram.

sns.kdeplot(
sales["Sales"]
)
plt.title(
"Density Plot"
)
plt.show()

These are useful when comparing multiple distributions.


Boxplots

Boxplots are one of the most useful tools for analysts.

They summarize:

  • Median
  • Quartiles
  • Outliers

Example

sns.boxplot(
x=sales["Sales"]
)
plt.show()

Why Boxplots Matter

Imagine two retailers.

Both average $100,000 sales.

However:

Retailer A:

95
100
105
100
100

Retailer B:

20
30
100
170
180

Same average.

Very different variability.

Boxplots reveal these differences immediately.


Detecting Outliers

Outliers often indicate:

  • Data entry errors
  • Fraud
  • Rare events
  • Exceptional performance

Find them visually:

sns.boxplot(
x=sales["Sales"]
)
plt.show()

Points beyond the whiskers may be outliers.


Scatterplots

Scatterplots show relationships between variables.

Examples:

  • Age versus length of stay
  • Advertising versus sales
  • Inventory versus revenue

Example

data = pd.DataFrame({
"Inventory": [
50,
75,
100,
125,
150
],
"Sales": [
80,
110,
140,
180,
210
]
})

Plot:

plt.scatter(
data["Inventory"],
data["Sales"]
)
plt.xlabel(
"Inventory"
)
plt.ylabel(
"Sales"
)
plt.title(
"Inventory vs Sales"
)
plt.show()

Interpreting Scatterplots

Scatterplots help identify:

  • Positive relationships
  • Negative relationships
  • Nonlinear relationships
  • Clusters
  • Outliers

This chart is often the first step before regression modeling.


Healthcare Example

Suppose we have:

patients = pd.DataFrame({
"Age":[
25,
40,
55,
70,
85
],
"LengthOfStay":[
2,
4,
6,
9,
11
]
})

Visualize:

plt.scatter(
patients["Age"],
patients["LengthOfStay"]
)
plt.xlabel("Age")
plt.ylabel(
"Length Of Stay"
)
plt.show()

Question:

Do older patients stay longer?

The scatterplot helps answer that question.


Supply Chain Example

Inventory age distribution:

inventory = pd.DataFrame({
"DaysOut":
np.random.exponential(
300,
1000
)
})

Plot:

plt.hist(
inventory["DaysOut"],
bins=30
)
plt.title(
"Inventory Age Distribution"
)
plt.show()

This often reveals long right tails.

Many inventory datasets follow this pattern.


Correlation Heatmaps

Correlation measures linear relationships.

Create sample data:

data = pd.DataFrame({
"Sales":
np.random.normal(
100,
15,
100
),
"Inventory":
np.random.normal(
150,
25,
100
),
"Margin":
np.random.normal(
20,
3,
100
)
})

Calculate correlations:

corr = data.corr(
numeric_only=True
)

Plot:

sns.heatmap(
corr,
annot=True
)
plt.show()

Why Correlation Heatmaps Matter

They quickly show:

  • Strong relationships
  • Weak relationships
  • Variables that move together

Useful before regression or machine learning.


A Typical Analyst Workflow

When receiving a new dataset:

df.describe()

Then:

plt.hist(
df["Sales"]
)
plt.show()

Then:

sns.boxplot(
x=df["Sales"]
)
plt.show()

Then:

plt.scatter(
df["Inventory"],
df["Sales"]
)
plt.show()

Finally:

sns.heatmap(
df.corr(
numeric_only=True
),
annot=True
)
plt.show()

This sequence often reveals most important patterns.


Visualization Checklist

Before modeling data, always ask:

  1. What does the distribution look like?
  2. Are there outliers?
  3. Are there trends?
  4. Are variables correlated?
  5. Are there unusual observations?

Visualization often answers these questions faster than statistics.


Lesson Summary

In this lesson we learned:

  • Line charts
  • Bar charts
  • Histograms
  • Density plots
  • Boxplots
  • Scatterplots
  • Correlation heatmaps

Visualization is one of the most valuable skills in analytics because it transforms raw numbers into insights that decision-makers can understand.

In the next lesson we will learn Statistical Testing, where we move from describing data to making formal statistical conclusions.

Leave a Reply

Discover more from nerd-ish

Subscribe now to keep reading and get access to the full archive.

Continue reading