Introduction
If Excel is the calculator of modern business, then Pandas is the spreadsheet on steroids.
Almost every data analyst using Python spends most of their time working with Pandas. Whether you are analyzing healthcare data, sales transactions, inventory movements, customer behavior, or operational performance, the first step is almost always loading data into a Pandas DataFrame.
This lesson introduces the fundamental object used throughout Python data analysis: the DataFrame.
By the end of this lesson, you will be able to:
- Import data into Python
- Understand DataFrames and Series
- Inspect datasets
- Select rows and columns
- Understand data types
- Perform basic exploratory analysis
Throughout this course we will use examples from healthcare and supply chain analytics.
What is Pandas?
Pandas is a Python library designed for working with structured data.
Think of a DataFrame as a spreadsheet inside Python.
For example:
| PatientID | Age | Diagnosis | LengthOfStay |
|---|---|---|---|
| 101 | 45 | Pneumonia | 5 |
| 102 | 67 | COPD | 8 |
| 103 | 31 | Asthma | 2 |
or
| SKU | Sales | Inventory |
|---|---|---|
| A100 | 250 | 100 |
| B200 | 125 | 150 |
| C300 | 500 | 80 |
These tables become DataFrames in Pandas.
Installing Pandas
If Pandas is not already installed:
pip install pandas
Import it using:
import pandas as pd
The abbreviation pd is the standard convention used almost everywhere.
The DataFrame
A DataFrame is a two-dimensional table consisting of rows and columns.
Example:
import pandas as pddf = pd.DataFrame({ "PatientID":[101,102,103], "Age":[45,67,31], "Diagnosis":["Pneumonia","COPD","Asthma"]})print(df)
Output:
PatientID Age Diagnosis
0 101 45 Pneumonia
1 102 67 COPD
2 103 31 Asthma
Notice that Pandas automatically creates row numbers called indexes.
The Series
A Series is a single column of data.
Example:
df["Age"]
Output:
0 451 672 31
A DataFrame is essentially a collection of Series objects.
Think:
- Series = one column
- DataFrame = entire table
Loading Data
Most real-world analysis starts by importing a file.
CSV files:
df = pd.read_csv("patients.csv")
Excel files:
df = pd.read_excel("patients.xlsx")
Supply chain example:
sales = pd.read_csv("sales_history.csv")
Healthcare example:
patients = pd.read_csv("hospital_admissions.csv")
Viewing the First Rows
When receiving a new dataset, the first thing analysts do is inspect it.
View first five rows:
df.head()
Example:
patients.head()
Output:
PatientID Age Diagnosis101 45 Pneumonia102 67 COPD103 31 Asthma...
View first ten rows:
df.head(10)
Viewing the Last Rows
Sometimes the end of the file is important.
df.tail()
or
df.tail(10)
This is useful when checking imports and exports.
Understanding Dataset Structure
One of the most useful commands:
df.info()
Example output:
<class 'pandas.core.frame.DataFrame'>RangeIndex: 1000 entriesData columns:PatientID int64Age int64Diagnosis objectLengthOfStay float64
This tells us:
- Number of rows
- Number of columns
- Variable types
- Missing values
Always run this immediately after loading data.
Understanding Data Types
Common data types include:
| Type | Meaning |
|---|---|
| int64 | Integers |
| float64 | Decimal numbers |
| object | Text |
| bool | True/False |
| datetime64 | Dates |
Example:
df.dtypes
Output:
PatientID int64Age int64Diagnosis objectLengthOfStay float64
Understanding data types prevents many analysis errors.
Dataset Dimensions
How many rows and columns?
df.shape
Output:
(1000, 12)
Meaning:
- 1000 rows
- 12 columns
Extremely useful.
Column Names
View all columns:
df.columns
Example:
PatientIDAgeDiagnosisLengthOfStay
This is often one of the first commands analysts run.
Selecting Columns
Single column:
df["Age"]
Multiple columns:
df[["Age","Diagnosis"]]
Healthcare example:
patients[["Age","LengthOfStay"]]
Supply chain example:
inventory[["SKU","Sales"]]
Selecting Rows
Select by row position:
df.iloc[0]
Returns first row.
Select multiple rows:
df.iloc[0:5]
Returns first five rows.
Selecting Rows and Columns Together
Example:
df.iloc[0:5,0:3]
Meaning:
- First five rows
- First three columns
Very useful during exploration.
Summary Statistics
Quick summary:
df.describe()
Example output:
Agecount 1000mean 47.2std 15.8min 18max 89
This provides:
- Mean
- Standard deviation
- Minimum
- Maximum
- Quartiles
A powerful first look at the data.
Healthcare Example
Suppose we have hospital admissions:
patients = pd.read_csv("hospital_admissions.csv")
Investigate:
patients.head()patients.info()patients.describe()
Questions:
- Average age?
- Average length of stay?
- Missing diagnoses?
- Number of admissions?
This is the beginning of exploratory analysis.
Supply Chain Example
Suppose we have sales history:
sales = pd.read_csv("sales.csv")
Investigate:
sales.head()sales.info()sales.describe()
Questions:
- Average sales?
- Largest SKU?
- Inventory distribution?
- Number of active products?
Again, the first stage of every analysis project.
Common Beginner Workflow
Whenever you receive a new dataset:
import pandas as pddf = pd.read_csv("file.csv")df.head()df.info()df.shapedf.columnsdf.describe()
This sequence alone will quickly reveal most important characteristics of the dataset.
Lesson Summary
In this lesson we learned:
- What Pandas is
- What a DataFrame is
- What a Series is
- How to load CSV files
- How to load Excel files
- How to inspect data
- How to view rows and columns
- How to understand data types
- How to generate summary statistics
These skills form the foundation of every Python-based data analysis project.
In the next lesson we will learn Data Cleaning, where we begin dealing with missing values, duplicate records, inconsistent data, and the messy realities of real-world datasets.

Leave a Reply