Math Foundations for AI
This course sits between Introduction to AI (the ideas) and Machine Learning Foundations (the code). By the end you will be able to read ML notation, reason with conditional probability and Bayes, represent data as vectors and matrices, explain what a gradient is and how gradient descent minimizes error, summarize data with statistics, and see exactly where each tool shows up inside a machine-learning model.
Every module is hands-on: you will drag vectors, roll a ball downhill on a loss curve, watch beliefs update as evidence arrives, and trace the math through a complete model — not just read about it. Each ends with a short mastery check; pass it to mark the module complete.
Warm-up
Module 1The Language and Notation of ML
Functions and variables, summation notation, subscripts, and why logs and exponentials are everywhere. Activity: a notation decoder that translates plain English ⇄ math.
Pillar 1 · Probability
Module 2Conditional Probability
Sample spaces, events, the axioms, joint/marginal/conditional probability, and independence. Activity: a conditional-probability explorer over a population. AI anchor: spam filtering.
Module 3Bayes & Random Variables
Prior, likelihood, posterior; base rates; expectation; the Bernoulli and normal distributions. Activity: a belief-updating demo. AI anchor: naive Bayes and adaptive testing.
Pillar 2 · Linear algebra
Module 4Vectors & Data
Vectors, the dot product, norms, and the geometry of similarity. Activity: a 2-D vector playground. AI anchor: cosine similarity between embeddings — how LLMs represent meaning.
Module 5Matrices & Transformations
Matrices as datasets and as transformations; matrix multiplication; transpose and identity. Activity: a matrix-multiplication visualizer. AI anchor: a neural-network layer as a matrix multiply.
Pillar 3 · Optimization
Module 6Derivatives & Gradient Descent
Slope, rate of change, minima, the gradient, and following the slope downhill to reduce error. Activity: a gradient-descent demo with an adjustable learning rate. AI anchor: this is how models learn.
Pillar 4 · Statistics
Module 7Statistics for Data & Evaluation
Mean, variance, standard deviation; the normal curve; sampling; correlation vs. causation; why averages mislead. Activity: a data-summary and distribution explorer. AI anchor: reading evaluation metrics.
Synthesis
Module 8 · CapstoneThe Math of a Tiny Model
One small model — logistic regression — worked end to end on a tiny dataset, showing exactly where probability, vectors, the loss function, the gradient step, and evaluation each appear. Trace the math through a complete model and explain every step.