Search

· Agentic AI  · 11 min read

The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops

How to manage the shared state size in complex reasoning loops to prevent context window overflow without losing critical history.

Featured image for: The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops

TL;DR: The Blackboard architecture is the most robust way to handle complex, non-linear reasoning in multi-agent systems, but it suffers from the “Infinite Board Problem” where shared state grows to exceed context windows. To build long-running enterprise agents, developers must implement semantic eviction strategies rather than naive LRU approaches. By stratifying the board, ranking data by importance, and using summarization loops, systems can maintain focus and efficiency without losing critical history.

Let us talk about what happens when an autonomous agent gets lost in thought. If you have built systems using the Blackboard architecture, where specialized agents read and write to a shared state object (as we outlined in our primer on the blackboard architecture), you know it is the most robust way to handle complex, non-linear reasoning. It avoids the “Phone Game” degradation of linear chains and allows for emergent workflows.

But if you deploy these systems into production for tasks that require hours of continuous reasoning (like auditing a massive codebase or synthesizing a week’s worth of market data), you will run into a new physical limit.

We call it the Infinite Board Problem.

The strength of the Blackboard is that every agent can see everything that has happened. The weakness is that, eventually, everything becomes too much. The shared state grows until it exceeds the context window of the models you are using.

When the board fills up, the system fails. Either the model starts ignoring the oldest instructions (the “lost in the middle” problem), or the API call fails with a context overflow error.

To build long-running enterprise agents, you cannot just rely on larger context windows. You need to build an eraser. You need State Pruning.

The Context Wall

To understand why this is hard, we need to look at what is actually on the board.

In a well-designed Blackboard system, the state is not just a log of text. It is a structured object. It contains the primary objective, the current plan, active hypotheses, partial solutions, and the logs of tools that have been executed.

As the agents work, they append data.

  • The Researcher dumps ten pages of documentation onto the board.
  • The Coder writes three versions of a function and puts them all on the board for review.
  • The Tester runs a suite and dumps the stack traces of the failures onto the board.

Within twenty iterations, a structured JSON object that started at five kilobytes can grow to several megabytes.

If you are using a model with a million-token context window, you might think you have plenty of room. But inference costs scale with context size. Even if the model can read the whole board, doing so on every turn will destroy the economics of your application. You are paying the model to re-read the same documentation over and over. This is not just a technical limitation; it is an economic one.

You must keep the board small enough to fit in the “fast and cheap” zone of your model’s context window, typically under a few thousand tokens for rapid iterations.

Eviction Strategies: Beyond LRU

In traditional software engineering, when a cache fills up, we use an algorithm like Least Recently Used (LRU) to kick out the oldest data.

If you apply a naive LRU strategy to a Blackboard, you will break the agent’s brain.

Imagine the agent is working on Step 4 of a plan. The original goal definition and the security constraints were written to the board at Step 1. They are the oldest pieces of data. If you use an LRU strategy, the system will evict the goal and the constraints to make room for the latest compiler error logs. The agent will literally forget what it was supposed to do or what it was forbidden from doing.

We need semantic eviction strategies. Here are the three patterns that work.

1. Architectural Stratification

The first step is to divide the board into zones that have different lifespans.

  • The Core (Immutable): This contains the objective, the active constraints, and the global state flags. This section is never pruned. It must fit in the system prompt of every agent.
  • The Working Memory (Volatile): This contains the current hypotheses and the immediate data the agents are arguing about. This is the target for aggressive pruning.
  • The Archive (Cold Storage): This is where pruned data goes. It does not live in the active context window, but it is stored in a database (like Redis or Spanner). We provide the agents with a query_archive tool. If an agent realizes it needs to look at a past decision or a specific log that was pruned, it can search the archive. This is essentially a retrieval-augmented generation (RAG) pattern built into the agent’s memory management, ensuring that no data is truly lost, only moved out of expensive active memory.

2. Semantic Importance Ranking

Instead of timing data out, you must score it by importance.

When an agent writes to the board, it should attach a metadata tag indicating the type of contribution.

  • A Fact (e.g., “The database port is 5432”) has high importance.
  • A Hypothesis (e.g., “Maybe the query is slow because of a missing index”) has medium importance.
  • A Log Trace (e.g., the 500 lines of output from a failing build) has low importance once it has been summarized.

When the board exceeds a threshold (say, 80% of the target context size), the Control Shell runs a pruning pass. It drops the low-importance items first. If a log trace has been analyzed and a fix proposed, the raw trace can be moved to the Archive, leaving only the summary on the board.

3. The Summarization Loop

This is the most powerful pattern. When a specific thread of discussion on the board becomes too long, you do not delete it. You condense it.

If the Architect and the Security Agent have spent ten turns arguing about the correct way to implement a feature, the Control Shell can invoke a lightweight model to summarize the thread: “The team debated Approach A and Approach B. Approach B was chosen because it avoids exposing the API key in the logs. The decision is final.”

The system replaces the ten turns of argument with that three-sentence summary. The context is preserved, but the token count drops by 90%. This ensures that the agent retains the core knowledge without the baggage of the full conversation history.

The Consensus Loop Integrity

There is one critical rule when pruning a Blackboard: You cannot prune an active argument.

In the Blackboard architecture, agents operate in a Write-Critique-Refine loop to reach consensus. If the Security agent has posted a critique of a piece of code, and the Coder agent has not yet responded, that critique cannot be pruned, even if it is the oldest item in the working memory zone.

If you prune the critique, the Coder agent will assume the code is fine and proceed to deployment, violating the safety protocol.

The Control Shell must check the dependency graph of the board before pruning. An item can only be pruned or summarized if all agents registered to watch that topic have acknowledged it or if a terminal state (consensus reached) has been set.

Implementing a Structured Board

To translate these strategies into working code, we need to move away from free-form JSON and define a strict schema for our board. In Python, the standard for this is Pydantic.

By using Pydantic, we can enforce the zones, the importance scores, and the dependency graph that the Control Shell needs to make intelligent pruning decisions.

Let us define the data structures for our board.

from enum import Enum
from typing import List, Dict, Optional
from pydantic import BaseModel, Field
from datetime import datetime

class ContributionType(str, Enum):
    FACT = "FACT"
    HYPOTHESIS = "HYPOTHESIS"
    LOG = "LOG"
    PLAN = "PLAN"
    DECISION = "DECISION"

class ItemStatus(str, Enum):
    ACTIVE = "ACTIVE"
    RESOLVED = "RESOLVED"
    DEBATED = "DEBATED"

class BoardItem(BaseModel):
    id: str
    author: str
    type: ContributionType
    importance: int = Field(..., ge=1, le=5) # 1 to 5 scale
    content: str
    status: ItemStatus = ItemStatus.ACTIVE
    depends_on: List[str] = Field(default_factory=list) # IDs of items this depends on
    created_at: datetime = Field(default_factory=datetime.utcnow)

class CoreZone(BaseModel):
    objective: str
    constraints: List[str]
    global_flags: Dict[str, bool] = Field(default_factory=dict)

class BlackboardState(BaseModel):
    core: CoreZone
    working_memory: Dict[str, BoardItem] = Field(default_factory=dict)
    archive: Dict[str, BoardItem] = Field(default_factory=dict)

With this schema, every piece of data on the board has an ID, a type, an importance score, and a status. Crucially, the depends_on list allows us to build a dependency graph. If an agent posts a critique of a code block, that critique item will list the code block’s ID in its depends_on list.

The Advanced State Evictor

Now let us build the evictor that enforces the consensus guard and applies the pruning strategies.

This class monitors the size of the working memory. When it exceeds the threshold, it first tries to archive resolved discussions. If that is not enough, it moves to importance-based pruning, always respecting the consensus guard.

class AdvancedBlackboardEvictor:
    def __init__(self, max_items=50, summary_model=None):
        self.max_items = max_items
        self.summary_model = summary_model # A lightweight LLM client for summarization
        
    def check_and_prune(self, state: BlackboardState) -> BlackboardState:
        """Monitors the board size and applies pruning strategies."""
        
        # If the working memory is small enough, do nothing
        if len(state.working_memory) <= self.max_items:
            return state
            
        print(f"Working memory size ({len(state.working_memory)}) exceeds limit. Initiating pruning.")
        
        # Step 1: Apply the Consensus Guard
        # We need to find which items are actively being debated
        active_arguments = self._get_active_arguments(state)
        
        # Step 2: Archive resolved logs and facts
        state = self._archive_resolved_items(state, active_arguments)
        
        # Step 3: If still too large, summarize long threads
        if len(state.working_memory) > self.max_items:
            state = self._summarize_threads(state, active_arguments)
            
        # Step 4: If STILL too large, drop low importance items that are not active
        if len(state.working_memory) > self.max_items:
            state = self._evict_low_importance(state, active_arguments)
            
        return state
        
    def _get_active_arguments(self, state: BlackboardState) -> set:
        """Returns a set of item IDs that are part of an unresolved discussion."""
        active_args = set()
        
        # Find items marked as DEBATED or items that have unresolved dependencies
        for item_id, item in state.working_memory.items():
            if item.status == ItemStatus.DEBATED:
                active_args.add(item_id)
                # Also protect the items it depends on
                active_args.update(item.depends_on)
                
            # If an item depends on another, and is not resolved, protect both
            if item.depends_on and item.status != ItemStatus.RESOLVED:
                active_args.add(item_id)
                active_args.update(item.depends_on)
                
        return active_args
        
    def _archive_resolved_items(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
        """Moves resolved items that are not part of active arguments to the archive."""
        to_move = []
        
        for item_id, item in state.working_memory.items():
            if item.status == ItemStatus.RESOLVED and item_id not in active_arguments:
                to_move.append(item_id)
                
        for item_id in to_move:
            item = state.working_memory.pop(item_id)
            # Simple DB save (mock)
            print(f"Archiving item {item_id} to cold storage.")
            state.archive[item_id] = item
            
        return state
        
    def _summarize_threads(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
        """Mock method to show where summarization would occur."""
        # In production, you would group items by topic or dependencies,
        # send the text to a model, and replace the items with the summary.
        print("Invoking summarization loop for long threads...")
        return state
        
    def _evict_low_importance(self, state: BlackboardState, active_arguments: set) -> BlackboardState:
        """Drops low importance items that are not active arguments."""
        # Sort by importance (ascending) and timestamp (ascending - oldest first)
        items = list(state.working_memory.values())
        items.sort(key=lambda x: (x.importance, x.created_at))
        
        for item in items:
            if len(state.working_memory) <= self.max_items:
                break
                
            if item.id not in active_arguments and item.importance < 3:
                print(f"Evicting low importance item {item.id} from board.")
                state.working_memory.pop(item.id)
                
        return state

This implementation moves from a simple token counter to a structured governance system. The _get_active_arguments method is the key: it builds a simple dependency check to ensure we never delete a critique or a proposal that is still being processed by the team.

By using Pydantic, we have turned the Blackboard from a wild frontier of text into a database with clear rules.

Conclusion

We used to think that the solution to agent memory was just bigger context windows. We thought that if we could fit ten million tokens into a prompt, we wouldn’t need to worry about state management.

But we were wrong.

Bigger context windows make agents slower and more expensive. They make them less focused.

The future of Agentic AI is not infinite memory. The future is active forgetting. The systems that succeed in production will be the ones that know how to summarize their experience, archive the details, and keep the active workspace clean.

If you are building a Blackboard system, do not wait for the context wall. Build your evictor now. It is the fundamental difference between a fragile prototype that works only in a lab and a robust production system that handles real-world complexity.

Back to Blog

Related Posts

View All Posts »
MCP: The End of the API Wrapper

MCP: The End of the API Wrapper

We analyze the JSON-RPC internals of the Model Context Protocol (MCP) and why the 'Context Exchange' architecture renders traditional integration code obsolete.