🎯 Goal:
By the end of this chapter, you’ll be able to:
- Understand the basic machine learning workflow
- Train your first ML model using scikit-learn
- Make predictions and evaluate model accuracy
🧠 1. What is Machine Learning?
Machine Learning (ML) is a way to teach computers to learn patterns from data instead of being explicitly programmed.
Think of it like this:
“Here’s past data. Learn from it. Now predict the future.”
We’ll start with Supervised Learning, specifically a classification task.
📦 2. Import the Tools
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
🌸 3. Load the Iris Dataset
The Iris dataset is a classic ML dataset with 3 types of flowers and 4 numeric features.
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["target"] = iris.target
print(df.head())
✂️ 4. Split Data into Train and Test
X = df.drop("target", axis=1)
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
🧠 5. Train a Decision Tree Classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
🔍 6. Make Predictions
y_pred = model.predict(X_test)
print("Predicted:", y_pred)
print("Actual: ", list(y_test))
🎯 7. Evaluate Accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {acc:.2f}")
🧪 8. Practice Time
Try the following:
- Replace DecisionTreeClassifier with LogisticRegression or KNeighborsClassifier
- Change the test size to 30%
- Add max_depth=2 to your tree and see how it affects accuracy
- Print a classification report (from sklearn.metrics import classification_report)
✅ Summary
- You used scikit-learn to train a basic ML model
- Split data into training and testing
- Trained, predicted, and evaluated with just a few lines
- This is the foundation of supervised learning


Leave a comment