← All Machine Learning Foundations modules

Module 1 — The Modeling Workflow

Start here · hands-on · about 25 minutes.

Almost every machine-learning project — a spam filter, a price predictor, a recommendation engine — follows the same loop. Learn the loop once and every model in this course is just a different choice inside it. This module gives you that map, and the one habit that keeps machine learning honest: testing on data the model has never seen.

The loop, end to end

A model is not magic — it is a recipe with six repeating stages. Click each stage below to see what it means, then send one example all the way through.

This activity needs JavaScript. The six stages are: Data, Features, Model, Loss, Train, Evaluate.

Two ways to learn: supervised vs. unsupervised

The biggest fork in the road is whether your data comes with answers.

Two flavors of supervised: regression vs. classification

Within supervised learning, what you predict decides the tool:

Same workflow, different last step. Get the task type right and you have already narrowed the model to a handful of sensible choices.

The one rule: never test on what you trained on

A model that has seen an example can repeat its answer — that proves nothing. So before training, we split the data into a training set (the model learns from this) and a test set (held back, used once to estimate real-world performance). A model that does well on the training set but poorly on the test set has overfit — memorized instead of learned. We call doing well on unseen data generalization, and it is the whole game. Module 8 is devoted to it.

AI anchor — this loop is everywhere The chatbot you used, the fraud check on your card, the photos app that finds your dog: all of them are this exact loop. They differ only in the features (pixels, words, transactions), the model (a tree, a network), and the loss (what counts as a mistake). When an ML system fails in the news — biased, brittle, confidently wrong — the cause is almost always one box in this diagram: bad data, a leaky train/test split, or the wrong loss. The workflow is also your debugging checklist.

Sort the tasks

Below are real problems. For each, decide whether it is regression, classification, or clustering. You will get a score.

This activity needs JavaScript.

Why this matters next Every remaining module drops into one box of this loop. Modules 2–5 are different models for the supervised case; Modules 6–7 are the unsupervised branch; Module 8 is the evaluate box done right. Keep this diagram in mind and the course is one idea, eight times.
One-sentence summary: machine learning is a six-stage loop — data → features → model → loss → train → evaluate — where you choose supervised vs. unsupervised and regression vs. classification, and always judge the model on data it never trained on.

Next: Regression — Fitting a Line →