Module 1 — The Modeling Workflow
Almost every machine-learning project — a spam filter, a price predictor, a recommendation engine — follows the same loop. Learn the loop once and every model in this course is just a different choice inside it. This module gives you that map, and the one habit that keeps machine learning honest: testing on data the model has never seen.
The loop, end to end
A model is not magic — it is a recipe with six repeating stages. Click each stage below to see what it means, then send one example all the way through.
This activity needs JavaScript. The six stages are: Data, Features, Model, Loss, Train, Evaluate.
Two ways to learn: supervised vs. unsupervised
The biggest fork in the road is whether your data comes with answers.
- Supervised learning — every training example has a known label \( y \). The model learns a mapping from features \( x \) to label \( y \). Spam/not-spam, price, diagnosis. Most of this course.
- Unsupervised learning — no labels at all. The model finds structure in the features alone: groups (clustering) or compact summaries (dimensionality reduction). Modules 6 and 7.
Two flavors of supervised: regression vs. classification
Within supervised learning, what you predict decides the tool:
- Regression predicts a number on a continuous scale — temperature, price, hours. (Module 2.)
- Classification predicts a category from a fixed set — spam/ham, cat/dog/bird, pass/fail. (Modules 3–5.)
Same workflow, different last step. Get the task type right and you have already narrowed the model to a handful of sensible choices.
The one rule: never test on what you trained on
A model that has seen an example can repeat its answer — that proves nothing. So before training, we split the data into a training set (the model learns from this) and a test set (held back, used once to estimate real-world performance). A model that does well on the training set but poorly on the test set has overfit — memorized instead of learned. We call doing well on unseen data generalization, and it is the whole game. Module 8 is devoted to it.
Sort the tasks
Below are real problems. For each, decide whether it is regression, classification, or clustering. You will get a score.
This activity needs JavaScript.