Introduction
In previous lessons, we learned about healthcare systems, medical terminology, anatomy and physiology, common diseases, healthcare data, and coding systems.
Now we are ready to explore where most healthcare data actually lives: the Electronic Health Record, commonly known as the EHR.
For a healthcare data analyst, understanding the structure of an EHR is just as important as understanding a company’s ERP system, CRM system, or inventory database.
Nearly every healthcare analytics project ultimately involves extracting information from EHR systems.
Whether you are studying diabetes outcomes, predicting readmissions, analyzing hospital performance, or building machine learning models, the underlying data will often come from an Electronic Health Record.
In this lesson, we will learn how EHR systems are structured, how patient information is stored, and how healthcare databases are organized.
What Is an Electronic Health Record?
An Electronic Health Record is a digital version of a patient’s medical history.
Think of the EHR as the central repository for healthcare information.
The EHR stores information such as:
- Patient demographics
- Diagnoses
- Medications
- Laboratory results
- Procedures
- Clinical notes
- Imaging reports
- Hospital admissions
Every interaction between a patient and the healthcare system can contribute information to the EHR.
Why EHRs Exist
Historically, healthcare records were stored on paper.
Paper records created many problems:
- Difficult to access
- Difficult to share
- Difficult to search
- Easy to lose
- Difficult to analyze
Electronic Health Records were developed to improve:
- Accessibility
- Accuracy
- Continuity of care
- Reporting
- Research
- Analytics
Today, most modern healthcare organizations rely heavily on EHR systems.
The Patient-Centered Nature of EHRs
One of the most important concepts to understand is that EHRs are organized around patients.
Everything in the EHR ultimately connects back to a patient.
For example:
Patient John Smith may have:
- 25 clinic visits
- 3 hospital admissions
- 120 laboratory tests
- 8 diagnoses
- 12 medications
All of this information is linked through the patient’s unique identifier.
This allows healthcare providers to view a complete picture of the patient’s health history.
The Core Components of an EHR
Although EHR systems vary across organizations, most contain similar types of information.
Patient Demographics
Demographic data describes the patient.
Common fields include:
- Patient ID
- Date of birth
- Sex
- Address
- Ethnicity
- Marital status
Demographic variables are frequently used in predictive models and healthcare research.
For example:
Age is one of the strongest predictors in many healthcare outcomes.
Encounters
An encounter represents an interaction between a patient and the healthcare system.
Examples include:
- Office visit
- Emergency department visit
- Hospital admission
- Telehealth appointment
Each encounter generates information.
For example, a clinic visit may generate:
- Vital signs
- Diagnoses
- Medications
- Physician notes
Healthcare databases often contain a separate encounter table.
Diagnoses
Diagnoses describe medical conditions affecting the patient.
Examples include:
- Diabetes
- Hypertension
- COPD
- Heart failure
Diagnoses are usually stored using ICD codes.
A patient may accumulate many diagnoses over time.
Some diagnoses may be temporary.
Others may remain throughout the patient’s life.
Procedures
Procedures describe healthcare services provided to the patient.
Examples include:
- Colonoscopy
- Surgery
- MRI scan
- Cardiac catheterization
Procedures help analysts understand what treatments and interventions occurred.
Medications
Medication records contain information about drugs prescribed to patients.
Examples include:
- Drug name
- Dose
- Start date
- End date
Medication data is essential for studying treatment effectiveness and adherence.
Laboratory Results
Laboratory systems continuously feed information into the EHR.
Examples include:
- Glucose
- HbA1c
- Creatinine
- Cholesterol
- Hemoglobin
Laboratory data often serves as some of the most important predictors in healthcare analytics.
Vital Signs
Vital signs measure basic physiological functions.
Examples include:
- Heart rate
- Blood pressure
- Temperature
- Respiratory rate
- Oxygen saturation
These measurements are collected repeatedly and provide valuable longitudinal information.
Clinical Notes
Clinical notes contain free-text documentation written by healthcare providers.
Examples include:
- Physician notes
- Nursing notes
- Discharge summaries
- Consultation reports
Clinical notes often contain information not available elsewhere.
However, they are more difficult to analyze because they are unstructured.
Structured Versus Unstructured Information
One of the most important concepts in healthcare analytics is understanding the difference between structured and unstructured data.
Structured data fits neatly into database fields.
Examples include:
- Age
- Blood pressure
- Glucose
- Diagnosis code
Unstructured data consists of free text.
Examples include:
- Clinical notes
- Discharge summaries
- Radiology reports
Structured data is easier to analyze.
Unstructured data often requires Natural Language Processing techniques.
Understanding Relational Databases
Most EHR systems are built using relational databases.
Instead of storing everything in one massive table, information is separated into related tables.
Imagine a simple healthcare database containing:
Patients
Encounters
Diagnoses
Laboratory Results
Medications
Procedures
Each table stores a specific type of information.
The tables are connected using patient identifiers.
This design improves efficiency and reduces duplication.
Why Multiple Tables Are Necessary
Suppose a patient has:
- 50 laboratory tests
- 15 diagnoses
- 10 medications
If all information were stored in a single table, the database would become extremely inefficient.
Instead, healthcare systems use separate tables.
For example:
The patient table stores demographic information.
The diagnosis table stores diagnoses.
The laboratory table stores laboratory measurements.
The medication table stores prescriptions.
Relationships connect the tables together.
This structure is similar to many business databases that analysts already use.
Understanding One-to-Many Relationships
Healthcare databases frequently contain one-to-many relationships.
For example:
One patient can have many encounters.
One encounter can generate many laboratory tests.
One patient can receive many medications.
One patient can have many diagnoses.
Understanding these relationships is critical when writing SQL queries.
Failure to understand relationships can easily lead to duplicate records and incorrect analyses.
Longitudinal Patient Data
Healthcare data is rarely static.
Patients generate information continuously over time.
Consider a diabetic patient.
Year 1:
- HbA1c = 9.0
Year 2:
- HbA1c = 8.1
Year 3:
- HbA1c = 7.2
This sequence tells a story about improving disease control.
Healthcare analysts frequently work with longitudinal data because diseases evolve over time.
Event-Based Data
Many healthcare analyses focus on events.
Examples include:
- Hospital admission
- Heart attack
- Surgery
- Stroke
- Death
Analysts often ask questions such as:
- Did the event occur?
- When did the event occur?
- How long until the event occurred?
These questions form the basis of survival analysis and time-to-event modeling.
Data Warehouses in Healthcare
Large healthcare organizations often maintain data warehouses.
A data warehouse combines information from many systems.
Examples include:
- EHR systems
- Laboratory systems
- Pharmacy systems
- Billing systems
- Scheduling systems
The goal is to create a centralized environment for analytics.
Healthcare analysts often work directly with data warehouses rather than operational EHR systems.
Data Marts
A data mart is a specialized subset of a data warehouse.
For example:
A diabetes data mart may contain only diabetes-related information.
A cardiology data mart may contain only cardiovascular information.
Data marts simplify analytics by focusing on specific business or clinical needs.
Example: Predicting Readmissions
Suppose a hospital wants to predict which patients are likely to return within 30 days of discharge.
The analyst might use:
Demographics:
- Age
- Sex
Diagnoses:
- Diabetes
- Heart failure
Laboratory Results:
- Creatinine
- Hemoglobin
Medication Data:
- Number of medications
Utilization History:
- Previous admissions
Notice how information is pulled from multiple EHR components.
This is typical of healthcare analytics projects.
Common Challenges in EHR Data
EHR data presents unique challenges.
Data may be missing.
Different providers may document information differently.
Coding practices may vary.
Patients may receive care at multiple organizations.
Clinical notes may contain ambiguous language.
Data cleaning often consumes a significant portion of a healthcare analyst’s time.
Understanding these limitations is essential when interpreting results.
Real-World Example
Suppose a researcher wants to study diabetic patients.
The researcher may need information from:
Patient Table:
- Age
- Sex
Diagnosis Table:
- Diabetes diagnoses
Laboratory Table:
- HbA1c
- Glucose
Medication Table:
- Insulin prescriptions
Encounter Table:
- Hospital visits
Combining information from multiple tables allows the researcher to answer meaningful clinical questions.
This is exactly how many healthcare analytics projects operate.
Key Takeaways
- Electronic Health Records are the primary source of healthcare data.
- EHRs store patient-centered information including demographics, diagnoses, medications, procedures, laboratory results, and clinical notes.
- Most EHR systems use relational databases.
- Healthcare databases are organized into multiple related tables.
- Understanding one-to-many relationships is essential for healthcare SQL.
- Healthcare data is highly longitudinal and event-driven.
- Data warehouses and data marts are commonly used for healthcare analytics.
- Many healthcare projects require integrating information from multiple EHR components.
Exercises
- What is an Electronic Health Record?
- Why are healthcare databases organized into multiple tables?
- Explain the difference between structured and unstructured data.
- Give three examples of one-to-many relationships in healthcare databases.
- Why is longitudinal data important in healthcare analytics?
- What is a data warehouse, and why is it useful?
- If you were building a readmission prediction model, which EHR components would you use?
Coming Next
In Lesson 8, we will begin exploring healthcare SQL and database querying. We will learn how healthcare tables are linked together, how to build patient cohorts, how to work with encounters and diagnoses, and how analysts retrieve data from healthcare databases for research and reporting.

Leave a Reply