Module 4 — Memory: Context & Recall

Reasoning across steps · hands-on · about 30 minutes.

In Module 3 the agent retained its complete Thought–Action–Observation trace to complete the task. That trace resides in the model's context window — the finite span of recent text accessible to the model at any single inference call. The context window is bounded: once it is full, the oldest entries are evicted. An agent that executes many iterations, or maintains an extended conversation, will lose access to earlier information unless a secondary memory mechanism is introduced.

Two kinds of memory

Short-term (the context window) — the recent turns the agent can see right now. Fast and automatic, but capped: when it is full, the oldest turn is dropped to make room. This is the working memory from Module 3.
Long-term (a memory store) — a place the agent can deliberately save a fact to and recall it later by key, even after that fact has scrolled out of the context window. In real systems this is often a database or a vector store.

The essential design decision is determining which information warrants commitment to long-term memory. Anything the agent must not lose — a user's identity, a binding constraint, an intermediate result required at a later iteration — should be written to the store rather than left to be evicted from the context window.

Demonstration: context-window eviction

The agent below has a small context window — it can retain only the most recent turns. Add turns and observe the oldest entries being evicted. Then repeat with the long-term store enabled: the agent commits key facts to the store, and these remain accessible after they have been evicted from the context window.

This activity needs JavaScript. The lesson below still covers everything.

Querying the agent's recall

After loading several turns above, query the agent on information that depends on an earlier fact. With only the context window, the fact may have already been evicted. With the store enabled, the agent retrieves it from long-term memory. This pairing of a context window with an external memory store is the standard architecture of production assistants.

This activity needs JavaScript.

Two memory types expressed in code

# short-term: a capped list — the context window
def remember(turn):
    context.append(turn)
    while len(context) > WINDOW:   # full? drop the oldest
        context.pop(0)

# long-term: an explicit save / recall the agent controls
store["user_name"] = "Mienie"      # save a fact by key
name = store.get("user_name")        # recall it later, after it left the window

Production memory systems are more elaborate — vector embeddings enable the agent to retrieve by semantic similarity rather than by exact key (the vector representations introduced in Course 2 and Course 5) — but the underlying principle is identical: a persistent store that outlives the context window.

Recall by meaning, not by exact key

The store above needs the exact key. Production memory retrieves by meaning: every saved memory and the incoming query are turned into vectors, and the closest ones are pulled back — even when they share no words. Pick a question and watch cosine similarity rank the memories and retrieve the top matches. This is the retrieval step of retrieval-augmented generation.

This activity needs JavaScript. The lesson below still covers everything.

AI anchor — the mechanisms of memory in modern assistants Every limitation observable in extended interactions with AI chat assistants traces to the context window: pasting a long document causes the beginning to be omitted from inference; prolonged conversations cause early statements to be evicted. The standard mitigation in production systems is precisely the mechanism introduced in this module — a long-term memory store (typically a vector database) to which the assistant writes salient facts and from which it retrieves them on demand. Features such as persistent memory, project workspaces, and retrieval-augmented generation are each implementations of a store that persists beyond the context window.

Check your understanding

Answer a short set of questions on context and recall.

This activity needs JavaScript.

Why this matters next An agent capable of action, memory, and recall still requires a mechanism for handling goals too large for a single iteration. Module 5 introduces the discipline of explicit planning — decomposing a goal into an ordered sequence of subtasks prior to execution.

Summary: an agent maintains two memory systems — a fixed-capacity context window that retains only recent turns and silently evicts the oldest, and a long-term store to which facts can be explicitly committed and retrieved by key — so that essential information persists after it has been evicted from the context window.

Next: Planning Before Acting →