Introduction
Data analysis is not only about calculating numbers.
A great analyst must also communicate findings clearly.
Suppose you calculate that:
- Average hospital stay is 5.4 days.
- Average inventory turn is 0.72.
- Sales increased by 18%.
Those numbers are useful.
However, often a simple chart communicates the message far more effectively.
Visualization allows us to:
- Discover patterns
- Identify outliers
- Detect trends
- Compare groups
- Communicate results
This lesson introduces the most important charts used by data analysts.
We will use:
import matplotlib.pyplot as pltimport seaborn as sns
Matplotlib is the foundation.
Seaborn builds on top of Matplotlib and provides better statistical graphics.
Loading Libraries
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns
Creating a Sample Dataset
sales = pd.DataFrame({ "Month": [ "Jan", "Feb", "Mar", "Apr", "May", "Jun" ], "Sales": [ 100, 120, 140, 130, 170, 200 ]})
Line Charts
Line charts are used for trends over time.
Examples:
- Monthly sales
- Daily hospital admissions
- Inventory levels
- Revenue growth
Creating a Line Chart
plt.plot( sales["Month"], sales["Sales"])plt.title("Monthly Sales")plt.xlabel("Month")plt.ylabel("Sales")plt.show()
The x-axis represents time.
The y-axis represents the variable being measured.
Why Analysts Love Line Charts
Line charts quickly reveal:
- Upward trends
- Downward trends
- Seasonal patterns
- Sudden changes
For example:
- Sales growth
- Patient admissions
- Inventory depletion
Bar Charts
Bar charts compare categories.
Examples:
- Sales by customer
- Revenue by region
- Patients by diagnosis
Example
customers = pd.DataFrame({ "Customer": [ "Alpha", "Beta", "Gamma" ], "Sales": [ 250, 400, 300 ]})
Create a bar chart:
plt.bar( customers["Customer"], customers["Sales"])plt.title("Sales by Customer")plt.xlabel("Customer")plt.ylabel("Sales")plt.show()
Horizontal Bar Charts
Often easier to read.
plt.barh( customers["Customer"], customers["Sales"])plt.title("Sales by Customer")plt.show()
These are especially useful when category names are long.
Histograms
Histograms show distributions.
One of the most important charts in statistics.
Questions answered:
- Are values normally distributed?
- Is the data skewed?
- Are there multiple peaks?
- Are there extreme observations?
Example
import numpy as npsales = pd.DataFrame({ "Sales": np.random.normal( 100, 15, 1000 )})
Plot histogram:
plt.hist( sales["Sales"], bins=20)plt.title( "Distribution of Sales")plt.xlabel("Sales")plt.ylabel("Frequency")plt.show()
Understanding Histograms
Suppose most sales cluster around 100.
The histogram reveals:
- Center
- Spread
- Shape
Before building statistical models, always examine the distribution.
Density Plots
Density plots provide a smoother version of a histogram.
sns.kdeplot( sales["Sales"])plt.title( "Density Plot")plt.show()
These are useful when comparing multiple distributions.
Boxplots
Boxplots are one of the most useful tools for analysts.
They summarize:
- Median
- Quartiles
- Outliers
Example
sns.boxplot( x=sales["Sales"])plt.show()
Why Boxplots Matter
Imagine two retailers.
Both average $100,000 sales.
However:
Retailer A:
95100105100100
Retailer B:
2030100170180
Same average.
Very different variability.
Boxplots reveal these differences immediately.
Detecting Outliers
Outliers often indicate:
- Data entry errors
- Fraud
- Rare events
- Exceptional performance
Find them visually:
sns.boxplot( x=sales["Sales"])plt.show()
Points beyond the whiskers may be outliers.
Scatterplots
Scatterplots show relationships between variables.
Examples:
- Age versus length of stay
- Advertising versus sales
- Inventory versus revenue
Example
data = pd.DataFrame({ "Inventory": [ 50, 75, 100, 125, 150 ], "Sales": [ 80, 110, 140, 180, 210 ]})
Plot:
plt.scatter( data["Inventory"], data["Sales"])plt.xlabel( "Inventory")plt.ylabel( "Sales")plt.title( "Inventory vs Sales")plt.show()
Interpreting Scatterplots
Scatterplots help identify:
- Positive relationships
- Negative relationships
- Nonlinear relationships
- Clusters
- Outliers
This chart is often the first step before regression modeling.
Healthcare Example
Suppose we have:
patients = pd.DataFrame({ "Age":[ 25, 40, 55, 70, 85 ], "LengthOfStay":[ 2, 4, 6, 9, 11 ]})
Visualize:
plt.scatter( patients["Age"], patients["LengthOfStay"])plt.xlabel("Age")plt.ylabel( "Length Of Stay")plt.show()
Question:
Do older patients stay longer?
The scatterplot helps answer that question.
Supply Chain Example
Inventory age distribution:
inventory = pd.DataFrame({ "DaysOut": np.random.exponential( 300, 1000 )})
Plot:
plt.hist( inventory["DaysOut"], bins=30)plt.title( "Inventory Age Distribution")plt.show()
This often reveals long right tails.
Many inventory datasets follow this pattern.
Correlation Heatmaps
Correlation measures linear relationships.
Create sample data:
data = pd.DataFrame({ "Sales": np.random.normal( 100, 15, 100 ), "Inventory": np.random.normal( 150, 25, 100 ), "Margin": np.random.normal( 20, 3, 100 )})
Calculate correlations:
corr = data.corr( numeric_only=True)
Plot:
sns.heatmap( corr, annot=True)plt.show()
Why Correlation Heatmaps Matter
They quickly show:
- Strong relationships
- Weak relationships
- Variables that move together
Useful before regression or machine learning.
A Typical Analyst Workflow
When receiving a new dataset:
df.describe()
Then:
plt.hist( df["Sales"])plt.show()
Then:
sns.boxplot( x=df["Sales"])plt.show()
Then:
plt.scatter( df["Inventory"], df["Sales"])plt.show()
Finally:
sns.heatmap( df.corr( numeric_only=True ), annot=True)plt.show()
This sequence often reveals most important patterns.
Visualization Checklist
Before modeling data, always ask:
- What does the distribution look like?
- Are there outliers?
- Are there trends?
- Are variables correlated?
- Are there unusual observations?
Visualization often answers these questions faster than statistics.
Lesson Summary
In this lesson we learned:
- Line charts
- Bar charts
- Histograms
- Density plots
- Boxplots
- Scatterplots
- Correlation heatmaps
Visualization is one of the most valuable skills in analytics because it transforms raw numbers into insights that decision-makers can understand.
In the next lesson we will learn Statistical Testing, where we move from describing data to making formal statistical conclusions.

Leave a Reply