Lesson 8: Healthcare SQL and Database Querying

6–8 minutes

Introduction

At this point in the course, we understand:

  • Healthcare systems
  • Medical terminology
  • Anatomy and physiology
  • Common diseases
  • Healthcare data
  • Coding systems
  • Electronic Health Records

Now we are ready to begin working with healthcare databases directly.

For most healthcare analysts, SQL is one of the most important skills they will use on a daily basis.

Healthcare organizations generate enormous amounts of data.

A large hospital may store:

  • Millions of patient records
  • Millions of laboratory results
  • Millions of diagnoses
  • Millions of medication records

Without SQL, retrieving useful information from these databases would be nearly impossible.

In this lesson, we will learn how healthcare databases are structured and how analysts retrieve data using SQL.

The focus of this lesson is understanding healthcare database logic rather than memorizing SQL syntax.


Why SQL Is Essential in Healthcare

Healthcare analysts spend a significant portion of their time answering questions such as:

  • How many diabetic patients were admitted last year?
  • What is the average length of stay for heart failure patients?
  • Which patients were readmitted within 30 days?
  • How many COPD patients visited the emergency department?

All of these questions require extracting information from healthcare databases.

SQL serves as the bridge between raw healthcare data and healthcare analytics.


Understanding the Healthcare Database

Recall from the previous lesson that healthcare databases are usually divided into multiple related tables.

A simplified healthcare database may contain:

Patients

Encounters

Diagnoses

Laboratory Results

Medications

Procedures

Each table contains different information.

Analysts combine these tables to answer healthcare questions.


The Patient Table

The patient table contains demographic information.

Typical variables include:

  • Patient ID
  • Date of birth
  • Sex
  • Ethnicity
  • Postal code

The Patient ID is extremely important because it allows us to connect information across multiple tables.

Think of Patient ID as the unique identifier for each individual.


The Encounter Table

An encounter represents an interaction between a patient and the healthcare system.

Examples include:

  • Clinic visits
  • Emergency department visits
  • Hospital admissions

Typical variables include:

  • Encounter ID
  • Patient ID
  • Encounter Date
  • Encounter Type
  • Admission Date
  • Discharge Date

One patient can have many encounters.


The Diagnosis Table

The diagnosis table stores medical conditions.

Examples include:

  • Diabetes
  • Hypertension
  • COPD
  • Heart Failure

Typical variables include:

  • Diagnosis ID
  • Patient ID
  • Encounter ID
  • ICD Code
  • Diagnosis Date

A patient can have many diagnoses.


The Laboratory Table

The laboratory table contains test results.

Examples include:

  • Glucose
  • HbA1c
  • Creatinine
  • Hemoglobin

Typical variables include:

  • Lab ID
  • Patient ID
  • Test Name
  • Result
  • Test Date

A patient can have hundreds or even thousands of laboratory results.


The Medication Table

The medication table contains prescription information.

Examples include:

  • Drug Name
  • Dose
  • Start Date
  • End Date

Medication data is frequently used to evaluate treatment effectiveness and adherence.


Understanding Relationships

One of the most important concepts in healthcare databases is relationships.

Consider the following example.

One patient may have:

  • 10 encounters
  • 25 diagnoses
  • 200 laboratory tests
  • 15 medications

Therefore:

One patient is related to many records in other tables.

This is called a one-to-many relationship.

Healthcare databases contain many one-to-many relationships.

Understanding them is essential for writing accurate SQL queries.


Example Question 1

Suppose a hospital administrator asks:

“How many patients have diabetes?”

This sounds simple, but several questions immediately arise.

Do we count:

  • Unique patients?
  • Diagnoses?
  • Encounters?

A single patient may receive multiple diabetes diagnoses.

Therefore, healthcare analysts often focus on unique patients rather than diagnosis records.

Understanding the business question is just as important as writing the query.


Example Question 2

Suppose a physician asks:

“What is the average HbA1c among diabetic patients?”

To answer this question, we need:

Step 1:
Identify diabetic patients.

Step 2:
Retrieve their laboratory results.

Step 3:
Calculate the average HbA1c.

Notice that information must be retrieved from multiple tables.

This is one of the most common patterns in healthcare analytics.


Building Cohorts

A cohort is a group of patients sharing certain characteristics.

Healthcare analysts build cohorts constantly.

Examples include:

  • Patients with diabetes
  • Patients with COPD
  • Patients admitted during 2025
  • Patients over age 65
  • Patients receiving insulin

Many healthcare analyses begin with cohort creation.


Example Cohort: Diabetes

Suppose we want to identify diabetic patients.

We might use:

  • ICD diagnosis codes
  • Diabetes medication records
  • HbA1c measurements

Depending on the project, different cohort definitions may be appropriate.

Defining cohorts carefully is one of the most important skills in healthcare analytics.


Understanding Inclusion Criteria

Healthcare studies typically include specific inclusion criteria.

Examples:

  • Age greater than 18
  • Diagnosis of diabetes
  • At least one hospital visit

Only patients satisfying these conditions are included.


Understanding Exclusion Criteria

Researchers often exclude certain patients.

Examples:

  • Missing key variables
  • Insufficient follow-up
  • Rare conditions
  • Pediatric patients

Exclusion criteria help improve study quality.


Encounter-Level Analysis

Sometimes analysts focus on patients.

Other times they focus on encounters.

Consider emergency department visits.

A single patient may visit the emergency department multiple times.

If the goal is to understand utilization, encounter-level analysis may be appropriate.

If the goal is to understand disease prevalence, patient-level analysis may be better.

Understanding the unit of analysis is critical.


Patient-Level vs Encounter-Level Analysis

Patient-level analysis asks:

How many unique patients?

Encounter-level analysis asks:

How many visits?

Suppose:

Patient A visits the emergency department 10 times.

Patient B visits once.

Patient-level count:

2 patients

Encounter-level count:

11 visits

Both numbers are correct, but they answer different questions.


Time Windows in Healthcare Analytics

Healthcare analyses frequently involve time windows.

Examples include:

  • Last 30 days
  • Last 90 days
  • Last year
  • Last five years

Time windows help define cohorts and outcomes.

For example:

Patients with an HbA1c measurement within the last year.

Patients admitted during 2025.

Patients readmitted within 30 days.

Time is central to healthcare analytics.


Understanding Readmissions

Readmission analysis is one of the most common healthcare projects.

A readmission occurs when a patient returns to the hospital after discharge.

A common metric is:

30-day readmission.

Questions include:

  • Which patients are at high risk?
  • What factors contribute to readmissions?
  • How can readmissions be reduced?

Readmission studies often combine:

  • Demographics
  • Diagnoses
  • Laboratory results
  • Medication information

Example: Identifying Readmissions

Suppose a patient is discharged on:

January 1

and returns on:

January 20

This is a readmission within 30 days.

Healthcare analysts often build logic that searches for future encounters occurring shortly after discharge.

This type of analysis is very common in healthcare.


Data Quality Checks

Before performing any analysis, healthcare analysts should perform data quality checks.

Examples include:

Checking for duplicate patients.

Checking for impossible values.

Checking for missing dates.

Checking for laboratory values outside expected ranges.

Checking for invalid diagnosis codes.

Healthcare analytics is only as good as the underlying data.


Real-World Example

Suppose a hospital wants to identify diabetic patients at high risk of hospitalization.

The analyst may retrieve:

From Patients:

  • Age
  • Sex

From Diagnoses:

  • Diabetes
  • Hypertension

From Laboratories:

  • HbA1c
  • Creatinine

From Encounters:

  • Previous admissions

From Medications:

  • Insulin use

These data are combined to build a predictive model.

This workflow is extremely common in healthcare analytics.


Common Mistakes New Analysts Make

New healthcare analysts often:

  • Forget to remove duplicate patients
  • Confuse patient counts with encounter counts
  • Ignore missing data
  • Use incorrect cohort definitions
  • Fail to understand diagnosis coding

Understanding healthcare context is just as important as understanding SQL syntax.


Key Takeaways

  • SQL is one of the most important tools used by healthcare analysts.
  • Healthcare databases are organized into related tables.
  • Patient IDs connect information across tables.
  • Cohort creation is a fundamental healthcare analytics task.
  • Analysts must understand the difference between patient-level and encounter-level analyses.
  • Time windows play a critical role in healthcare studies.
  • Readmission analysis is one of the most common healthcare projects.
  • Data quality checks should always be performed before analysis.

Exercises

  1. Explain the difference between patient-level and encounter-level analysis.
  2. Why are Patient IDs important in healthcare databases?
  3. What is a cohort?
  4. Give three examples of inclusion criteria.
  5. Give three examples of exclusion criteria.
  6. Why is readmission analysis important?
  7. What data sources might be used to build a hospitalization prediction model?
  8. Why should analysts perform data quality checks before analysis?

Coming Next

In Lesson 9, we will move from database structure to healthcare study design. We will learn how researchers formulate healthcare questions, define outcomes, choose populations, and design observational studies that generate reliable evidence.

Leave a Reply

Discover more from Nerdish.Org

Subscribe now to keep reading and get access to the full archive.

Continue reading