· Agentic AI · 11 min read
The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops
How to manage the shared state size in complex reasoning loops to prevent context window overflow without losing critical history.

TL;DR: The Blackboard architecture is the most robust way to handle complex, non-linear reasoning in multi-agent systems, but it suffers from the “Infinite Board Problem” where shared state grows to exceed context windows. To build long-running enterprise agents, developers must implement semantic eviction strategies rather than naive LRU approaches. By stratifying the board, ranking data by importance, and using summarization loops, systems can maintain focus and efficiency without losing critical history.
Let us talk about what happens when an autonomous agent gets lost in thought. If you have built systems using the Blackboard architecture, where specialized agents read and write to a shared state object (as we outlined in our primer on the blackboard architecture), you know it is the most robust way to handle complex, non-linear reasoning. It avoids the “Phone Game” degradation of linear chains and allows for emergent workflows.
But if you deploy these systems into production for tasks that require hours of continuous reasoning (like auditing a massive codebase or synthesizing a week’s worth of market data), you will run into a new physical limit.
We call it the Infinite Board Problem.
The strength of the Blackboard is that every agent can see everything that has happened. The weakness is that, eventually, everything becomes too much. The shared state grows until it exceeds the context window of the models you are using.
When the board fills up, the system fails. Either the model starts ignoring the oldest instructions (the “lost in the middle” problem), or the API call fails with a context overflow error.
To build long-running enterprise agents, you cannot just rely on larger context windows. You need to build an eraser. You need State Pruning.
The Context Wall
To understand why this is hard, we need to look at what is actually on the board.
In a well-designed Blackboard system, the state is not just a log of text. It is a structured object. It contains the primary objective, the current plan, active hypotheses, partial solutions, and the logs of tools that have been executed.
As the agents work, they append data.
- The Researcher dumps ten pages of documentation onto the board.
- The Coder writes three versions of a function and puts them all on the board for review.
- The Tester runs a suite and dumps the stack traces of the failures onto the board.
Within twenty iterations, a structured JSON object that started at five kilobytes can grow to several megabytes.
If you are using a model with a million-token context window, you might think you have plenty of room. But inference costs scale with context size. Even if the model can read the whole board, doing so on every turn will destroy the economics of your application. You are paying the model to re-read the same documentation over and over. This is not just a technical limitation; it is an economic one.
You must keep the board small enough to fit in the “fast and cheap” zone of your model’s context window, typically under a few thousand tokens for rapid iterations.
Eviction Strategies: Beyond LRU
In traditional software engineering, when a cache fills up, we use an algorithm like Least Recently Used (LRU) to kick out the oldest data.
If you apply a naive LRU strategy to a Blackboard, you will break the agent’s brain.
Imagine the agent is working on Step 4 of a plan. The original goal definition and the security constraints were written to the board at Step 1. They are the oldest pieces of data. If you use an LRU strategy, the system will evict the goal and the constraints to make room for the latest compiler error logs. The agent will literally forget what it was supposed to do or what it was forbidden from doing.
We need semantic eviction strategies. Here are the three patterns that work.
1. Architectural Stratification
The first step is to divide the board into zones that have different lifespans.
- The Core (Immutable): This contains the objective, the active constraints, and the global state flags. This section is never pruned. It must fit in the system prompt of every agent.
- The Working Memory (Volatile): This contains the current hypotheses and the immediate data the agents are arguing about. This is the target for aggressive pruning.
- The Archive (Cold Storage): This is where pruned data goes. It does not live in the active context window, but it is stored in a database (like Redis or Spanner). We provide the agents with a
query_archivetool. If an agent realizes it needs to look at a past decision or a specific log that was pruned, it can search the archive. This is essentially a retrieval-augmented generation (RAG) pattern built into the agent’s memory management, ensuring that no data is truly lost, only moved out of expensive active memory.
2. Semantic Importance Ranking
Instead of timing data out, you must score it by importance.
When an agent writes to the board, it should attach a metadata tag indicating the type of contribution.
- A Fact (e.g., “The database port is 5432”) has high importance.
- A Hypothesis (e.g., “Maybe the query is slow because of a missing index”) has medium importance.
- A Log Trace (e.g., the 500 lines of output from a failing build) has low importance once it has been summarized.
When the board exceeds a threshold (say, 80% of the target context size), the Control Shell runs a pruning pass. It drops the low-importance items first. If a log trace has been analyzed and a fix proposed, the raw trace can be moved to the Archive, leaving only the summary on the board.
3. The Summarization Loop
This is the most powerful pattern. When a specific thread of discussion on the board becomes too long, you do not delete it. You condense it.
If the Architect and the Security Agent have spent ten turns arguing about the correct way to implement a feature, the Control Shell can invoke a lightweight model to summarize the thread: “The team debated Approach A and Approach B. Approach B was chosen because it avoids exposing the API key in the logs. The decision is final.”
The system replaces the ten turns of argument with that three-sentence summary. The context is preserved, but the token count drops by 90%. This ensures that the agent retains the core knowledge without the baggage of the full conversation history.
The Consensus Loop Integrity
There is one critical rule when pruning a Blackboard: You cannot prune an active argument.
In the Blackboard architecture, agents operate in a Write-Critique-Refine loop to reach consensus. If the Security agent has posted a critique of a piece of code, and the Coder agent has not yet responded, that critique cannot be pruned, even if it is the oldest item in the working memory zone.
If you prune the critique, the Coder agent will assume the code is fine and proceed to deployment, violating the safety protocol.
The Control Shell must check the dependency graph of the board before pruning. An item can only be pruned or summarized if all agents registered to watch that topic have acknowledged it or if a terminal state (consensus reached) has been set.
Implementing a Structured Board
To translate these strategies into working code, we need to move away from free-form JSON and define a strict schema for our board. In Python, the standard for this is Pydantic.
By using Pydantic, we can enforce the zones, the importance scores, and the dependency graph that the Control Shell needs to make intelligent pruning decisions.
Let us define the data structures for our board.
from enum import Enum
from typing import List, Dict, Optional
from pydantic import BaseModel, Field
from datetime import datetime
class ContributionType(str, Enum):
FACT = "FACT"
HYPOTHESIS = "HYPOTHESIS"
LOG = "LOG"
PLAN = "PLAN"
DECISION = "DECISION"
class ItemStatus(str, Enum):
ACTIVE = "ACTIVE"
RESOLVED = "RESOLVED"
DEBATED = "DEBATED"
class BoardItem(BaseModel):
id: str
author: str
type: ContributionType
importance: int = Field(..., ge=1, le=5) # 1 to 5 scale
content: str
status: ItemStatus = ItemStatus.ACTIVE
depends_on: List[str] = Field(default_factory=list) # IDs of items this depends on
created_at: datetime = Field(default_factory=datetime.utcnow)
class CoreZone(BaseModel):
objective: str
constraints: List[str]
global_flags: Dict[str, bool] = Field(default_factory=dict)
class BlackboardState(BaseModel):
core: CoreZone
working_memory: Dict[str, BoardItem] = Field(default_factory=dict)
archive: Dict[str, BoardItem] = Field(default_factory=dict)With this schema, every piece of data on the board has an ID, a type, an importance score, and a status. Crucially, the depends_on list allows us to build a dependency graph. If an agent posts a critique of a code block, that critique item will list the code block’s ID in its depends_on list.
The Advanced State Evictor
Now let us build the evictor that enforces the consensus guard and applies the pruning strategies.
This class monitors the size of the working memory. When it exceeds the threshold, it first tries to archive resolved discussions. If that is not enough, it moves to importance-based pruning, always respecting the consensus guard.
class AdvancedBlackboardEvictor:
def __init__(self, max_items=50, summary_model=None):
self.max_items = max_items
self.summary_model = summary_model # A lightweight LLM client for summarization
def check_and_prune(self, state: BlackboardState) -> BlackboardState:
"""Monitors the board size and applies pruning strategies."""
# If the working memory is small enough, do nothing
if len(state.working_memory) <= self.max_items:
return state
print(f"Working memory size ({len(state.working_memory)}) exceeds limit. Initiating pruning.")
# Step 1: Apply the Consensus Guard
# We need to find which items are actively being debated
active_arguments = self._get_active_arguments(state)
# Step 2: Archive resolved logs and facts
state = self._archive_resolved_items(state, active_arguments)
# Step 3: If still too large, summarize long threads
if len(state.working_memory) > self.max_items:
state = self._summarize_threads(state, active_arguments)
# Step 4: If STILL too large, drop low importance items that are not active
if len(state.working_memory) > self.max_items:
state = self._evict_low_importance(state, active_arguments)
return state
def _get_active_arguments(self, state: BlackboardState) -> set:
"""Returns a set of item IDs that are part of an unresolved discussion."""
active_args = set()
# Find items marked as DEBATED or items that have unresolved dependencies
for item_id, item in state.working_memory.items():
if item.status == ItemStatus.DEBATED:
active_args.add(item_id)
# Also protect the items it depends on
active_args.update(item.depends_on)
# If an item depends on another, and is not resolved, protect both
if item.depends_on and item.status != ItemStatus.RESOLVED:
active_args.add(item_id)
active_args.update(item.depends_on)
return active_args
def _archive_resolved_items(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
"""Moves resolved items that are not part of active arguments to the archive."""
to_move = []
for item_id, item in state.working_memory.items():
if item.status == ItemStatus.RESOLVED and item_id not in active_arguments:
to_move.append(item_id)
for item_id in to_move:
item = state.working_memory.pop(item_id)
# Simple DB save (mock)
print(f"Archiving item {item_id} to cold storage.")
state.archive[item_id] = item
return state
def _summarize_threads(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
"""Mock method to show where summarization would occur."""
# In production, you would group items by topic or dependencies,
# send the text to a model, and replace the items with the summary.
print("Invoking summarization loop for long threads...")
return state
def _evict_low_importance(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
"""Drops low importance items that are not active arguments."""
# Sort by importance (ascending) and timestamp (ascending - oldest first)
items = list(state.working_memory.values())
items.sort(key=lambda x: (x.importance, x.created_at))
for item in items:
if len(state.working_memory) <= self.max_items:
break
if item.id not in active_arguments and item.importance < 3:
print(f"Evicting low importance item {item.id} from board.")
state.working_memory.pop(item.id)
return stateThis implementation moves from a simple token counter to a structured governance system. The _get_active_arguments method is the key: it builds a simple dependency check to ensure we never delete a critique or a proposal that is still being processed by the team.
By using Pydantic, we have turned the Blackboard from a wild frontier of text into a database with clear rules.
Conclusion
We used to think that the solution to agent memory was just bigger context windows. We thought that if we could fit ten million tokens into a prompt, we wouldn’t need to worry about state management.
But we were wrong.
Bigger context windows make agents slower and more expensive. They make them less focused.
The future of Agentic AI is not infinite memory. The future is active forgetting. The systems that succeed in production will be the ones that know how to summarize their experience, archive the details, and keep the active workspace clean.
If you are building a Blackboard system, do not wait for the context wall. Build your evictor now. It is the fundamental difference between a fragile prototype that works only in a lab and a robust production system that handles real-world complexity.



