Context Bloat: Implementing Progressive Discovery in Agent Memory

Key Takeaways

Dumping all available information into an agent's context window at the start of a task is a recipe for hallucinations, latency spikes, and massive API bills.
Context bloat degrades the reasoning capabilities of even the most advanced models because critical instructions get lost in the noise.
Progressive discovery solves this by providing the agent with an index or a search tool, forcing it to actively retrieve only the specific information it needs at that exact moment.
Implementing this in frameworks like LangGraph requires a shift from passive state injection to active tool-calling loops.
Treating context as a highly constrained resource is the fundamental difference between a toy prototype and a production-grade autonomous system.

When we build our first autonomous agent, we usually follow a very predictable pattern. We give the agent a persona, we define a goal, and then we inject every single piece of data we think it might possibly need directly into the system prompt. We load up the database schema, the entire history of the user’s interactions, the API documentation for five different services, and a massive list of formatting rules.

We do this because the context windows have become enormous. With models supporting millions of tokens, the temptation is to treat the prompt like a limitless hard drive. We dump the data in and tell the agent to “figure it out.”

This is an architectural disaster. It leads directly to a phenomenon known as Context Bloat.

In this walkthrough, we are going to look at why Context Bloat destroys agent performance, the mechanics of how it degrades reasoning, and how to implement a pattern called “Progressive Discovery” to keep your agents lean, fast, and accurate. If you are struggling with broader architectural issues in multi-agent systems, I suggest reading up on The Agent Supervisor Pattern.

The Physics of Context Bloat

There are three distinct ways that Context Bloat ruins an agentic workflow in production.

First, Latency. Every token you push into the context window has to be processed during the prefill phase before the model can generate a single word of output. If you inject 500,000 tokens of irrelevant database schemas into the prompt, the user is going to sit staring at a loading spinner for five seconds while the GPUs churn through text the agent does not even need.

Second, Cost. You pay for every token in the prompt. If your agent is running in a loop, making multiple calls to the model to refine its plan, and you are passing that massive bloated context back and forth every single time, your API bill will scale exponentially.

Third, and most importantly, Reasoning Degradation. LLMs are not databases. They suffer from the “Lost in the Middle” phenomenon. When you bury a critical instruction or a specific data point in a mountain of irrelevant text, the model’s attention mechanism struggles to weigh it correctly. The agent starts hallucinating. It ignores constraints. It gets confused by conflicting information that was not relevant to the specific task at hand.

You cannot fix bad reasoning by adding more context. You fix it by removing the noise.

The Shift to Progressive Discovery

The solution is a design pattern called Progressive Discovery. Instead of giving the agent the answers upfront, you give the agent the tools to find the answers when it realizes it needs them.

Think of it like hiring a senior engineer. You do not hand them a binder containing every line of code in the company’s history on their first day. You give them a laptop, access to the codebase, and a search tool. When they need to understand how the billing API works, they search for it.

We must build our agents the same way. We must transition from passive state injection to active retrieval.

Let us look at a practical example. Imagine an agent tasked with diagnosing a failure in a Google Kubernetes Engine (GKE) cluster.

The Bloated Approach: You write a script that pulls the last 10,000 lines of logs from Cloud Logging, the YAML configurations for every deployment in the namespace, and the current CPU metrics for all nodes. You jam all of this into a massive prompt and say, “Why did the payment service crash?” The model is overwhelmed. It misses the single line indicating an Out of Memory (OOM) error because it is distracted by thousands of lines of normal traffic logs.

The Progressive Discovery Approach: You give the agent a lean prompt: “You are a DevOps agent investigating a crash in the payment service. You have tools to query logs, inspect deployments, and check metrics. Determine the root cause.”

The agent’s context window is tiny. It is fast and focused. It reasons: “First, I need to see the logs for the payment service right before the crash.” It calls a tool: query_cloud_logging(service="payment", timeframe="last_5_minutes") The tool returns 50 relevant lines. The agent reads the 50 lines and sees an OOM error. It reasons: “It ran out of memory. Let me check the memory limits on the deployment.” It calls a tool: inspect_deployment_config(service="payment") The tool returns the specific YAML snippet. The agent compares the limits and concludes the investigation.

The agent actively navigated the problem space, pulling only the exact data required into its context window at each step. This is infinitely more robust, significantly faster, and drastically cheaper. For deeper insights on managing state across these loops, review The Blackboard Architecture.

Implementing in LangGraph

To implement Progressive Discovery in a framework like LangGraph, you have to build robust tool-calling loops. The agent must be explicitly trained and prompted to use its tools to explore the environment.

Here is a conceptual look at how you define the tools to facilitate this pattern. Notice that we do not give the agent the data; we give it an index.

# Conceptual implementation in a LangGraph environment
from langchain_core.tools import tool

@tool
def list_available_database_tables(schema_name: str) -> str:
    """Returns a list of table names in the specified schema.
    Use this first to find the table you need."""
    # Logic to query the DB and return just the names
    return "Users, Orders, Inventory, BillingTransactions"

@tool
def get_table_schema(table_name: str) -> str:
    """Returns the exact column definitions and types for a specific table."""
    # Logic to return the schema for ONLY the requested table
    return "id (UUID), user_id (UUID), amount (Decimal), status (String)"

@tool
def search_knowledge_base(query: str) -> str:
    """Searches the internal documentation for specific terms.
    Returns summaries of the top 3 relevant articles."""
    # Vector search logic
    return "Article 1 Summary: ... Article 2 Summary: ..."

In your LangGraph execution loop, the state object that is passed between nodes does not contain the database schema or the knowledge base articles. It only contains the user’s original request, the agent’s scratchpad (its reasoning), and the results of the specific tool calls it has made so far.

The prompt guiding the agent must heavily emphasize this workflow. You must explicitly state: “Do not guess. If you do not know the schema, use the list_available_database_tables tool. If you need documentation, use the search_knowledge_base tool.”

The Multi-Agent Implications

Progressive Discovery becomes absolutely critical when you move to multi-agent architectures. If you have a Researcher agent handing off a task to a Writer agent, the Researcher should not pass a 100,000-token document as the payload.

The handoff should be an executive summary and a set of pointers. The Researcher tells the Writer: “Here is the summary of the findings. The detailed transcripts are located at these specific URIs. Use your document retrieval tool to pull them if you need exact quotes.”

This keeps the communication channels between agents narrow and fast. It prevents the context bloat from compounding as the task moves through the system. For a look at how this impacts system architecture, check out my notes on Hierarchical KV Caching.

Treating Context as a Constraint

The most important mindset shift for an AI engineer is to stop treating the context window as a feature to be maximized, and start treating it as a constraint to be aggressively managed.

Just because the latest frontier models can store upto two million tokens does not mean you should send it two million tokens.

By forcing your agents to use Progressive Discovery, you are forcing them to exhibit a higher level of autonomy. You are testing their ability to plan, search, and synthesize, rather than just acting as a brute-force pattern matcher over a massive dump of text. It requires more engineering upfront to build the tools and design the loops, but it is the only viable path to building autonomous systems that are fast, cheap, and reliable enough for production deployments.

Search

Context Bloat: Implementing Progressive Discovery in Agent Memory

The Physics of Context Bloat

The Shift to Progressive Discovery

Implementing in LangGraph

The Multi-Agent Implications

Treating Context as a Constraint

Related Posts

State Management in LangGraph: Checkpointing and Time Travel

Agent FinOps: Architecting Economic Constraints into LLM Routing

The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops

The A2A Protocol: Standardizing Handoffs Between Heterogeneous Agents

The Physics of Context Bloat

The Shift to Progressive Discovery

Implementing in LangGraph

The Multi-Agent Implications

Treating Context as a Constraint

Related Posts

State Management in LangGraph: Checkpointing and Time Travel

Agent FinOps: Architecting Economic Constraints into LLM Routing

The Infinite Board Problem: Pruning State in Long-Running Reasoning Loops

The A2A Protocol: Standardizing Handoffs Between Heterogeneous Agents

Strictly Necessary

Analytics