Introduction
At this point in the course, we understand:
- Healthcare systems
- Medical terminology
- Anatomy and physiology
- Common diseases
- Healthcare data
- Coding systems
- Electronic Health Records
Now we are ready to begin working with healthcare databases directly.
For most healthcare analysts, SQL is one of the most important skills they will use on a daily basis.
Healthcare organizations generate enormous amounts of data.
A large hospital may store:
- Millions of patient records
- Millions of laboratory results
- Millions of diagnoses
- Millions of medication records
Without SQL, retrieving useful information from these databases would be nearly impossible.
In this lesson, we will learn how healthcare databases are structured and how analysts retrieve data using SQL.
The focus of this lesson is understanding healthcare database logic rather than memorizing SQL syntax.
Why SQL Is Essential in Healthcare
Healthcare analysts spend a significant portion of their time answering questions such as:
- How many diabetic patients were admitted last year?
- What is the average length of stay for heart failure patients?
- Which patients were readmitted within 30 days?
- How many COPD patients visited the emergency department?
All of these questions require extracting information from healthcare databases.
SQL serves as the bridge between raw healthcare data and healthcare analytics.
Understanding the Healthcare Database
Recall from the previous lesson that healthcare databases are usually divided into multiple related tables.
A simplified healthcare database may contain:
Patients
Encounters
Diagnoses
Laboratory Results
Medications
Procedures
Each table contains different information.
Analysts combine these tables to answer healthcare questions.
The Patient Table
The patient table contains demographic information.
Typical variables include:
- Patient ID
- Date of birth
- Sex
- Ethnicity
- Postal code
The Patient ID is extremely important because it allows us to connect information across multiple tables.
Think of Patient ID as the unique identifier for each individual.
The Encounter Table
An encounter represents an interaction between a patient and the healthcare system.
Examples include:
- Clinic visits
- Emergency department visits
- Hospital admissions
Typical variables include:
- Encounter ID
- Patient ID
- Encounter Date
- Encounter Type
- Admission Date
- Discharge Date
One patient can have many encounters.
The Diagnosis Table
The diagnosis table stores medical conditions.
Examples include:
- Diabetes
- Hypertension
- COPD
- Heart Failure
Typical variables include:
- Diagnosis ID
- Patient ID
- Encounter ID
- ICD Code
- Diagnosis Date
A patient can have many diagnoses.
The Laboratory Table
The laboratory table contains test results.
Examples include:
- Glucose
- HbA1c
- Creatinine
- Hemoglobin
Typical variables include:
- Lab ID
- Patient ID
- Test Name
- Result
- Test Date
A patient can have hundreds or even thousands of laboratory results.
The Medication Table
The medication table contains prescription information.
Examples include:
- Drug Name
- Dose
- Start Date
- End Date
Medication data is frequently used to evaluate treatment effectiveness and adherence.
Understanding Relationships
One of the most important concepts in healthcare databases is relationships.
Consider the following example.
One patient may have:
- 10 encounters
- 25 diagnoses
- 200 laboratory tests
- 15 medications
Therefore:
One patient is related to many records in other tables.
This is called a one-to-many relationship.
Healthcare databases contain many one-to-many relationships.
Understanding them is essential for writing accurate SQL queries.
Example Question 1
Suppose a hospital administrator asks:
“How many patients have diabetes?”
This sounds simple, but several questions immediately arise.
Do we count:
- Unique patients?
- Diagnoses?
- Encounters?
A single patient may receive multiple diabetes diagnoses.
Therefore, healthcare analysts often focus on unique patients rather than diagnosis records.
Understanding the business question is just as important as writing the query.
Example Question 2
Suppose a physician asks:
“What is the average HbA1c among diabetic patients?”
To answer this question, we need:
Step 1:
Identify diabetic patients.
Step 2:
Retrieve their laboratory results.
Step 3:
Calculate the average HbA1c.
Notice that information must be retrieved from multiple tables.
This is one of the most common patterns in healthcare analytics.
Building Cohorts
A cohort is a group of patients sharing certain characteristics.
Healthcare analysts build cohorts constantly.
Examples include:
- Patients with diabetes
- Patients with COPD
- Patients admitted during 2025
- Patients over age 65
- Patients receiving insulin
Many healthcare analyses begin with cohort creation.
Example Cohort: Diabetes
Suppose we want to identify diabetic patients.
We might use:
- ICD diagnosis codes
- Diabetes medication records
- HbA1c measurements
Depending on the project, different cohort definitions may be appropriate.
Defining cohorts carefully is one of the most important skills in healthcare analytics.
Understanding Inclusion Criteria
Healthcare studies typically include specific inclusion criteria.
Examples:
- Age greater than 18
- Diagnosis of diabetes
- At least one hospital visit
Only patients satisfying these conditions are included.
Understanding Exclusion Criteria
Researchers often exclude certain patients.
Examples:
- Missing key variables
- Insufficient follow-up
- Rare conditions
- Pediatric patients
Exclusion criteria help improve study quality.
Encounter-Level Analysis
Sometimes analysts focus on patients.
Other times they focus on encounters.
Consider emergency department visits.
A single patient may visit the emergency department multiple times.
If the goal is to understand utilization, encounter-level analysis may be appropriate.
If the goal is to understand disease prevalence, patient-level analysis may be better.
Understanding the unit of analysis is critical.
Patient-Level vs Encounter-Level Analysis
Patient-level analysis asks:
How many unique patients?
Encounter-level analysis asks:
How many visits?
Suppose:
Patient A visits the emergency department 10 times.
Patient B visits once.
Patient-level count:
2 patients
Encounter-level count:
11 visits
Both numbers are correct, but they answer different questions.
Time Windows in Healthcare Analytics
Healthcare analyses frequently involve time windows.
Examples include:
- Last 30 days
- Last 90 days
- Last year
- Last five years
Time windows help define cohorts and outcomes.
For example:
Patients with an HbA1c measurement within the last year.
Patients admitted during 2025.
Patients readmitted within 30 days.
Time is central to healthcare analytics.
Understanding Readmissions
Readmission analysis is one of the most common healthcare projects.
A readmission occurs when a patient returns to the hospital after discharge.
A common metric is:
30-day readmission.
Questions include:
- Which patients are at high risk?
- What factors contribute to readmissions?
- How can readmissions be reduced?
Readmission studies often combine:
- Demographics
- Diagnoses
- Laboratory results
- Medication information
Example: Identifying Readmissions
Suppose a patient is discharged on:
January 1
and returns on:
January 20
This is a readmission within 30 days.
Healthcare analysts often build logic that searches for future encounters occurring shortly after discharge.
This type of analysis is very common in healthcare.
Data Quality Checks
Before performing any analysis, healthcare analysts should perform data quality checks.
Examples include:
Checking for duplicate patients.
Checking for impossible values.
Checking for missing dates.
Checking for laboratory values outside expected ranges.
Checking for invalid diagnosis codes.
Healthcare analytics is only as good as the underlying data.
Real-World Example
Suppose a hospital wants to identify diabetic patients at high risk of hospitalization.
The analyst may retrieve:
From Patients:
- Age
- Sex
From Diagnoses:
- Diabetes
- Hypertension
From Laboratories:
- HbA1c
- Creatinine
From Encounters:
- Previous admissions
From Medications:
- Insulin use
These data are combined to build a predictive model.
This workflow is extremely common in healthcare analytics.
Common Mistakes New Analysts Make
New healthcare analysts often:
- Forget to remove duplicate patients
- Confuse patient counts with encounter counts
- Ignore missing data
- Use incorrect cohort definitions
- Fail to understand diagnosis coding
Understanding healthcare context is just as important as understanding SQL syntax.
Key Takeaways
- SQL is one of the most important tools used by healthcare analysts.
- Healthcare databases are organized into related tables.
- Patient IDs connect information across tables.
- Cohort creation is a fundamental healthcare analytics task.
- Analysts must understand the difference between patient-level and encounter-level analyses.
- Time windows play a critical role in healthcare studies.
- Readmission analysis is one of the most common healthcare projects.
- Data quality checks should always be performed before analysis.
Exercises
- Explain the difference between patient-level and encounter-level analysis.
- Why are Patient IDs important in healthcare databases?
- What is a cohort?
- Give three examples of inclusion criteria.
- Give three examples of exclusion criteria.
- Why is readmission analysis important?
- What data sources might be used to build a hospitalization prediction model?
- Why should analysts perform data quality checks before analysis?
Coming Next
In Lesson 9, we will move from database structure to healthcare study design. We will learn how researchers formulate healthcare questions, define outcomes, choose populations, and design observational studies that generate reliable evidence.

Leave a Reply