17. Google ADK - Context Management

November 26, 2025

This blog is part of the ADK Masterclass - Hands-On Series. As conversations grow longer, managing context efficiently becomes critical. ADK provides Context Caching and Context Compression (Compaction) to optimize performance and reduce costs.

Every LLM call sends the entire conversation history as context. For a 50-turn conversation, that means sending all 50 messages every single time—even though most of that content hasn't changed. This is expensive and slow. Context management solves this by caching static content and compressing older messages.

View Code on GitHub

1. What is Context?
2. Why Context Management Matters
3. When to Use Each Technique
4. Tutorial
5. Best Practices
Next Steps
Resources

1. What is Context?

When your agent runs, it needs background information to respond intelligently. This includes the conversation so far, any data stored in state, and access to services like artifact storage. ADK bundles all of this into "context" objects that your code can use.

flowchart LR A[User Query] --> B[Context Layer] B --> C[Agent Logic] C --> D[Response] B --- E[Past Messages] B --- F[State Data] B --- G[Services] style B fill:#e8f5e9,stroke:#2e7d32

The challenge? As conversations get longer, this context grows. A 50-turn chat might have 10,000+ tokens of history. That means slower responses and higher costs.

This module covers two techniques to keep context manageable:

Caching: Store and reuse static content like system prompts
Compaction: Summarize old messages to shrink the history

2. Why Context Management Matters

graph LR subgraph Problem["Without Optimization"] Long[Long Conversation] --> Full[Full Context Sent] Full --> Slow[Slow Response] Full --> Cost[High Cost] end subgraph Solution["With Context Management"] Long2[Long Conversation] --> Cache[Context Caching] Long2 --> Compress[Context Compression] Cache --> Fast[Fast Response] Compress --> Low[Lower Cost] end style Problem fill:#ffebee,stroke:#c62828 style Solution fill:#e8f5e9,stroke:#2e7d32

Every time our agent processes a message, it sends context to the LLM. As conversations grow:

Latency increases: More tokens to process means slower responses
Costs rise: We pay per token sent to the model
Context limits: Models have maximum context windows (e.g., 1M tokens for Gemini 1.5 Pro)

3. When to Use Each Technique

Technique	Best For	Trade-offs
Context Caching	Large static content (instructions, documents)	Requires min token threshold; cache expires
Context Compression	Long conversations with older messages	Loses some detail in summarization

4. Tutorial

Prerequisites

Google AI Studio API Key
Python 3.9+ installed

Setup Environment

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install google-adk python-dotenv

# Set our API key
export GOOGLE_API_KEY=our_api_key_here

4.1. Context Caching

Context caching allows us to reuse extended instructions or large data sets across multiple agent requests, reducing token usage and speeding up responses.

We use the ContextCacheConfig class at the App level:

from google.adk.agents import Agent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig

root_agent = Agent(
    # Configure an agent using Gemini 2.0 or higher
)

# Create the app with context caching configuration
app = App(
    name='my-caching-agent-app',
    root_agent=root_agent,
    context_cache_config=ContextCacheConfig(
        min_tokens=2048,       # Minimum tokens to trigger caching
        ttl_seconds=600,       # Store for up to 10 minutes
        cache_intervals=5,     # Refresh after 5 uses
    ),
)

Configuration Settings

Setting	Description	Default
`min_tokens`	Minimum tokens required to enable caching	0
`ttl_seconds`	Time-to-live for cached content (seconds)	1800 (30 min)
`cache_intervals`	Max uses before cache refresh	10

Best for: Agents with large system instructions, RAG contexts, or static knowledge bases.

4.2. Context Compression (Compaction)

Context compression (also called compaction) summarizes older parts of the conversation to reduce token count while preserving important information.

We use the EventsCompactionConfig class:

from google.adk.apps.app import App
from google.adk.apps.app import EventsCompactionConfig

app = App(
    name='my-agent',
    root_agent=root_agent,
    events_compaction_config=EventsCompactionConfig(
        compaction_interval=3,  # Trigger compaction every 3 new invocations
        overlap_size=1          # Include last invocation from previous window
    ),
)

How Compaction Works

With compaction_interval=3 and overlap_size=1:

Event 3 completes: All 3 events are compressed into a summary
Event 6 completes: Events 3-6 are compressed, including overlap of 1 prior event
Event 9 completes: Events 6-9 are compressed, and so on

Custom Summarizer

We can define a custom summarizer to control how events are compressed:

from google.adk.apps.app import App, EventsCompactionConfig
from google.adk.apps.llm_event_summarizer import LlmEventSummarizer
from google.adk.models import Gemini

# Define the AI model to be used for summarization
summarization_llm = Gemini(model='gemini-2.5-flash')

# Create the summarizer with the custom model
my_summarizer = LlmEventSummarizer(llm=summarization_llm)

# Configure the App with the custom summarizer and compaction settings
app = App(
    name='my-agent',
    root_agent=root_agent,
    events_compaction_config=EventsCompactionConfig(
        summarizer=my_summarizer,
        compaction_interval=3,
        overlap_size=1
    ),
)

5. Best Practices

Technique	When to Use	Benefits
Context Caching	Large static instructions, RAG	Faster responses, lower latency
Context Compaction	Long conversations, chat apps	Reduced costs, stays within limits
Both Combined	Production applications	Optimal performance & cost

Additional tips:

Monitor token usage to identify optimization opportunities
Test compaction intervals to find the right balance
Use structured state for important data instead of relying on conversation history
Consider session timeouts to prevent unbounded context growth

Next Steps

With context management covered, the next module explores Callbacks—how to intercept and customize agent behavior at various points in the execution lifecycle.

17. Google ADK - Context Management

Table of Contents

1. What is Context?

2. Why Context Management Matters

3. When to Use Each Technique

4. Tutorial

Prerequisites

Setup Environment

4.1. Context Caching

Configuration Settings

4.2. Context Compression (Compaction)

How Compaction Works

Custom Summarizer

5. Best Practices

Next Steps

Resources

Related Articles

Session, State & Memory

Callbacks

Multi-Agent Systems

Comments