Module 3 — The ReAct Pattern

Reasoning across steps · hands-on · about 30 minutes.

The agent in Module 2 performed a single tool invocation. Most real questions, however, are not single-step. "How many more people live in Tokyo than in Paris?" cannot be answered by a single lookup — the agent must retrieve Tokyo's population, then Paris's, then compute the difference. It must reason, act, observe the result, and reason again. This interleaving of reasoning and tool use is formalized in the ReAct paradigm — Reasoning + Acting — which has become the de facto standard for agent design.

The Thought–Action–Observation cycle

ReAct organizes the agent loop into an explicit trace consisting of three recurring operations:

Thought — the agent reasons about what it still needs ("I need Tokyo's population first").
Action — it calls a tool to get it (population("Tokyo")).
Observation — it reads the tool's result, which informs the next Thought.

These three operations repeat — Thought, Action, Observation, Thought, Action, Observation — until the agent has gathered sufficient information to emit a final Answer. Each step is selected after observing the result of the previous one. The agent is not executing a fixed plan; it is reasoning incrementally, exactly as in the Module 1 loop, but with reasoning made explicit and tool calls available at each step.

Trace a multi-step question

Select a question that requires more than one tool invocation. Click Step to advance one Thought–Action–Observation iteration at a time, or Run to display the complete trace. The final computation is performed only once the prerequisite lookups have completed and their results have been observed.

This activity needs JavaScript. The lesson below still covers everything.

The ReAct loop expressed in code

state = {"goal": question, "history": []}
while not done:
    thought, action = model(state)      # reason, then choose a tool
    if action.is_final:
        answer = action.text; break
    observation = tools[action.name](action.arg)
    state["history"].append((thought, action, observation))  # feed it back

The complete history — every prior Thought, Action, and Observation — is supplied to the model at each iteration. This accumulated trace constitutes the agent's working memory for the task. As discussed in Module 4, this memory has a finite capacity.

AI anchor — ReAct is the standard agent paradigm The 2022 ReAct paper demonstrated that requiring a model to write its reasoning and act within the same loop outperforms either operation in isolation: explicit reasoning improves tool-use decisions, and observations ground the reasoning in factual results rather than confabulation. Nearly every contemporary agent framework — LangChain agents, the OpenAI Assistants API, Claude's tool use — implements a variant of the Thought–Action–Observation loop. The observable pattern in modern AI assistants of reasoning, retrieving information, then reasoning again is precisely this loop in execution.

Check your understanding

Answer a short set of questions on the ReAct loop.

This activity needs JavaScript.

Why this matters next The agent relied on its complete trace to complete the task — but this working memory is bounded. Module 4 examines the context window: which information is retained, which is evicted, and how long-term memory mechanisms enable the agent to persist information across context boundaries.

Summary: ReAct interleaves Reasoning and Acting — the agent emits a Thought, performs an Action (a tool invocation), reads the resulting Observation, and iterates, selecting each subsequent step only after observing the prior result, until sufficient information has been gathered to emit a final Answer.

Next: Memory — Context & Recall →