Module 7 — Guardrails & Reliability
Everything so far assumed the agent behaves. Real agents do not — not always. The model picks the wrong tool, or calls the same one forever, or, worst of all, takes a real-world action you did not want. An agent loops in the same way a person stuck on a bad assumption does: confidently, and without noticing. The difference between a demo and a system you can trust is the guardrails wrapped around the loop — the checks that catch failure before it does damage.
Four guardrails every serious agent needs
- A step limit (max-steps) — a hard cap on how many loops the agent may run, so it can never spin forever. Hit the cap and it stops, no matter what.
- Loop detection — notice when the agent repeats the same action with the same input and getting nowhere, and break out instead of repeating it endlessly.
- Tool validation — refuse to run a tool the agent "called" that does not actually exist or is not allowed, rather than crashing or improvising.
- Human-in-the-loop — for risky or irreversible actions (sending an email, spending money, deleting files), pause and require a human to approve before acting.
Break an agent, then add guardrails
Pick a failure scenario and run it with guardrails off — watch the agent misbehave. Then switch the relevant guardrail on and run again: the trace shows it getting caught and stopped safely. The red BLOCKED lines are the guardrails doing their job.
This activity needs JavaScript. The lesson below still covers everything.
for step in range(MAX_STEPS): # 1. hard step cap action = decide(state) if action.name not in tools: # 3. tool validation break if action in already_tried: # 2. loop detection break if action.is_risky and not human_ok(action): # 4. human-in-the-loop break observation = tools[action.name](action.arg) already_tried.add(action)
None of these make the agent smarter — they make it safe. A capable agent without guardrails is a liability; the guardrails are what let you actually deploy one.
Check your understanding
A few questions about guardrails. You will get a score.
This activity needs JavaScript.