17. Context Management

This blog is part of the ADK Masterclass - Hands-On Series. As conversations grow longer, managing context efficiently becomes critical. ADK provides Context Caching and Context Compression (Compaction) to optimize performance and reduce costs.

Every LLM call sends the entire conversation history as context. For a 50-turn conversation, that means sending all 50 messages every single time—even though most of that content hasn't changed. This is expensive and slow. Context management solves this by caching static content and compressing older messages.

View Code on GitHub

Table of Contents

1. What is Context?

When your agent runs, it needs background information to respond intelligently. This includes the conversation so far, any data stored in state, and access to services like artifact storage. ADK bundles all of this into "context" objects that your code can use.

flowchart LR A[User Query] --> B[Context Layer] B --> C[Agent Logic] C --> D[Response] B --- E[Past Messages] B --- F[State Data] B --- G[Services] style B fill:#e8f5e9,stroke:#2e7d32

The challenge? As conversations get longer, this context grows. A 50-turn chat might have 10,000+ tokens of history. That means slower responses and higher costs.

This module covers two techniques to keep context manageable:

  • Caching: Store and reuse static content like system prompts
  • Compaction: Summarize old messages to shrink the history

2. Why Context Management Matters

graph LR subgraph Problem["Without Optimization"] Long[Long Conversation] --> Full[Full Context Sent] Full --> Slow[Slow Response] Full --> Cost[High Cost] end subgraph Solution["With Context Management"] Long2[Long Conversation] --> Cache[Context Caching] Long2 --> Compress[Context Compression] Cache --> Fast[Fast Response] Compress --> Low[Lower Cost] end style Problem fill:#ffebee,stroke:#c62828 style Solution fill:#e8f5e9,stroke:#2e7d32

Every time our agent processes a message, it sends context to the LLM. As conversations grow:

  • Latency increases: More tokens to process means slower responses
  • Costs rise: We pay per token sent to the model
  • Context limits: Models have maximum context windows (e.g., 1M tokens for Gemini 1.5 Pro)

3. When to Use Each Technique

Technique Best For Trade-offs
Context Caching Large static content (instructions, documents) Requires min token threshold; cache expires
Context Compression Long conversations with older messages Loses some detail in summarization

4. Tutorial

Prerequisites

Setup Environment

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install google-adk python-dotenv

# Set our API key
export GOOGLE_API_KEY=our_api_key_here

4.1. Context Caching

Context caching allows us to reuse extended instructions or large data sets across multiple agent requests, reducing token usage and speeding up responses.

We use the ContextCacheConfig class at the App level:

from google.adk.agents import Agent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig

root_agent = Agent(
    # Configure an agent using Gemini 2.0 or higher
)

# Create the app with context caching configuration
app = App(
    name='my-caching-agent-app',
    root_agent=root_agent,
    context_cache_config=ContextCacheConfig(
        min_tokens=2048,       # Minimum tokens to trigger caching
        ttl_seconds=600,       # Store for up to 10 minutes
        cache_intervals=5,     # Refresh after 5 uses
    ),
)

Configuration Settings

Setting Description Default
min_tokens Minimum tokens required to enable caching 0
ttl_seconds Time-to-live for cached content (seconds) 1800 (30 min)
cache_intervals Max uses before cache refresh 10

Best for: Agents with large system instructions, RAG contexts, or static knowledge bases.

4.2. Context Compression (Compaction)

Context compression (also called compaction) summarizes older parts of the conversation to reduce token count while preserving important information.

We use the EventsCompactionConfig class:

from google.adk.apps.app import App
from google.adk.apps.app import EventsCompactionConfig

app = App(
    name='my-agent',
    root_agent=root_agent,
    events_compaction_config=EventsCompactionConfig(
        compaction_interval=3,  # Trigger compaction every 3 new invocations
        overlap_size=1          # Include last invocation from previous window
    ),
)

How Compaction Works

With compaction_interval=3 and overlap_size=1:

  • Event 3 completes: All 3 events are compressed into a summary
  • Event 6 completes: Events 3-6 are compressed, including overlap of 1 prior event
  • Event 9 completes: Events 6-9 are compressed, and so on

Custom Summarizer

We can define a custom summarizer to control how events are compressed:

from google.adk.apps.app import App, EventsCompactionConfig
from google.adk.apps.llm_event_summarizer import LlmEventSummarizer
from google.adk.models import Gemini

# Define the AI model to be used for summarization
summarization_llm = Gemini(model='gemini-2.5-flash')

# Create the summarizer with the custom model
my_summarizer = LlmEventSummarizer(llm=summarization_llm)

# Configure the App with the custom summarizer and compaction settings
app = App(
    name='my-agent',
    root_agent=root_agent,
    events_compaction_config=EventsCompactionConfig(
        summarizer=my_summarizer,
        compaction_interval=3,
        overlap_size=1
    ),
)

5. Best Practices

Technique When to Use Benefits
Context Caching Large static instructions, RAG Faster responses, lower latency
Context Compaction Long conversations, chat apps Reduced costs, stays within limits
Both Combined Production applications Optimal performance & cost

Additional tips:

  • Monitor token usage to identify optimization opportunities
  • Test compaction intervals to find the right balance
  • Use structured state for important data instead of relying on conversation history
  • Consider session timeouts to prevent unbounded context growth

Next Steps

With context management covered, the next module explores Callbacks—how to intercept and customize agent behavior at various points in the execution lifecycle.

Resources

Comments