Lesson 1: Pandas Foundations: The Most Important Tool for Data Analysts

3–5 minutes

Introduction

If Excel is the calculator of modern business, then Pandas is the spreadsheet on steroids.

Almost every data analyst using Python spends most of their time working with Pandas. Whether you are analyzing healthcare data, sales transactions, inventory movements, customer behavior, or operational performance, the first step is almost always loading data into a Pandas DataFrame.

This lesson introduces the fundamental object used throughout Python data analysis: the DataFrame.

By the end of this lesson, you will be able to:

  • Import data into Python
  • Understand DataFrames and Series
  • Inspect datasets
  • Select rows and columns
  • Understand data types
  • Perform basic exploratory analysis

Throughout this course we will use examples from healthcare and supply chain analytics.


What is Pandas?

Pandas is a Python library designed for working with structured data.

Think of a DataFrame as a spreadsheet inside Python.

For example:

PatientIDAgeDiagnosisLengthOfStay
10145Pneumonia5
10267COPD8
10331Asthma2

or

SKUSalesInventory
A100250100
B200125150
C30050080

These tables become DataFrames in Pandas.


Installing Pandas

If Pandas is not already installed:

pip install pandas

Import it using:

import pandas as pd

The abbreviation pd is the standard convention used almost everywhere.


The DataFrame

A DataFrame is a two-dimensional table consisting of rows and columns.

Example:

import pandas as pd
df = pd.DataFrame({
"PatientID":[101,102,103],
"Age":[45,67,31],
"Diagnosis":["Pneumonia","COPD","Asthma"]
})
print(df)

Output:

   PatientID  Age  Diagnosis
0        101   45  Pneumonia
1        102   67       COPD
2        103   31     Asthma


Notice that Pandas automatically creates row numbers called indexes.


The Series

A Series is a single column of data.

Example:

df["Age"]

Output:

0 45
1 67
2 31

A DataFrame is essentially a collection of Series objects.

Think:

  • Series = one column
  • DataFrame = entire table

Loading Data

Most real-world analysis starts by importing a file.

CSV files:

df = pd.read_csv("patients.csv")

Excel files:

df = pd.read_excel("patients.xlsx")

Supply chain example:

sales = pd.read_csv("sales_history.csv")

Healthcare example:

patients = pd.read_csv("hospital_admissions.csv")

Viewing the First Rows

When receiving a new dataset, the first thing analysts do is inspect it.

View first five rows:

df.head()

Example:

patients.head()

Output:

PatientID Age Diagnosis
101 45 Pneumonia
102 67 COPD
103 31 Asthma
...

View first ten rows:

df.head(10)

Viewing the Last Rows

Sometimes the end of the file is important.

df.tail()

or

df.tail(10)

This is useful when checking imports and exports.


Understanding Dataset Structure

One of the most useful commands:

df.info()

Example output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries
Data columns:
PatientID int64
Age int64
Diagnosis object
LengthOfStay float64

This tells us:

  • Number of rows
  • Number of columns
  • Variable types
  • Missing values

Always run this immediately after loading data.


Understanding Data Types

Common data types include:

TypeMeaning
int64Integers
float64Decimal numbers
objectText
boolTrue/False
datetime64Dates

Example:

df.dtypes

Output:

PatientID int64
Age int64
Diagnosis object
LengthOfStay float64

Understanding data types prevents many analysis errors.


Dataset Dimensions

How many rows and columns?

df.shape

Output:

(1000, 12)

Meaning:

  • 1000 rows
  • 12 columns

Extremely useful.


Column Names

View all columns:

df.columns

Example:

PatientID
Age
Diagnosis
LengthOfStay

This is often one of the first commands analysts run.


Selecting Columns

Single column:

df["Age"]

Multiple columns:

df[["Age","Diagnosis"]]

Healthcare example:

patients[["Age","LengthOfStay"]]

Supply chain example:

inventory[["SKU","Sales"]]

Selecting Rows

Select by row position:

df.iloc[0]

Returns first row.

Select multiple rows:

df.iloc[0:5]

Returns first five rows.


Selecting Rows and Columns Together

Example:

df.iloc[0:5,0:3]

Meaning:

  • First five rows
  • First three columns

Very useful during exploration.


Summary Statistics

Quick summary:

df.describe()

Example output:

Age
count 1000
mean 47.2
std 15.8
min 18
max 89

This provides:

  • Mean
  • Standard deviation
  • Minimum
  • Maximum
  • Quartiles

A powerful first look at the data.


Healthcare Example

Suppose we have hospital admissions:

patients = pd.read_csv("hospital_admissions.csv")

Investigate:

patients.head()
patients.info()
patients.describe()

Questions:

  • Average age?
  • Average length of stay?
  • Missing diagnoses?
  • Number of admissions?

This is the beginning of exploratory analysis.


Supply Chain Example

Suppose we have sales history:

sales = pd.read_csv("sales.csv")

Investigate:

sales.head()
sales.info()
sales.describe()

Questions:

  • Average sales?
  • Largest SKU?
  • Inventory distribution?
  • Number of active products?

Again, the first stage of every analysis project.


Common Beginner Workflow

Whenever you receive a new dataset:

import pandas as pd
df = pd.read_csv("file.csv")
df.head()
df.info()
df.shape
df.columns
df.describe()

This sequence alone will quickly reveal most important characteristics of the dataset.


Lesson Summary

In this lesson we learned:

  • What Pandas is
  • What a DataFrame is
  • What a Series is
  • How to load CSV files
  • How to load Excel files
  • How to inspect data
  • How to view rows and columns
  • How to understand data types
  • How to generate summary statistics

These skills form the foundation of every Python-based data analysis project.

In the next lesson we will learn Data Cleaning, where we begin dealing with missing values, duplicate records, inconsistent data, and the messy realities of real-world datasets.

Leave a Reply

Discover more from Nerdish.Org

Subscribe now to keep reading and get access to the full archive.

Continue reading