%pip install -q -U google-adkAt the beginning of November 2025, Kaggle ran a 5-Day AI Agents Intensive Course covering core concepts of AI agents and how you can implement them with Google’s Agent Development Kit (ADK). To accompany the course, they released five whitepapers, which we will reference in this blog. This article reflects my study notes from the course.
For this, we will install the google-adk (v1.18.0) library for Python.
The agents we are building in this tutorial will be powered by gemini-2.5-flash-lite, which is free for experimental use. To use it, you will need to create a Gemini API key in Google AI Studio. Then, make sure to add your API key to your environment variables (or Google Colab Secrets).
import os
from google.colab import userdata
GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
os.environ["GOOGLE_API_KEY"] = GEMINI_API_KEY
MODEL_NAME = "gemini-2.5-flash-lite"Agent Fundamentals
Let’s start with creating and running a first simple Agent (or aliased as LlmAgent) in ADK. First, you’ll define an Agent with the following core components:
name: A name to identify the agent.model: We will be usinggemini-2.5-flash-litewithretry_optionsfor automatically handling failures by retrying the request.description: A description to identify the agent’s purpose.instructions: Instructions to describe the agent’s goal and how it should behave.tools: A list of tools that the agent can use (e.g., built-in Google search tool).
from google.genai import types
from google.adk.agents import Agent
from google.adk.models.google_llm import Gemini
from google.adk.tools import google_search
retry_config=types.HttpRetryOptions(
attempts=5, # Maximum retry attempts
exp_base=7, # Delay multiplier
initial_delay=1, # Initial delay before first retry (in seconds)
http_status_codes=[
429, # Too Many Requests
500, # Internal Server Error
503, # Service Unavailable
504, # Gateway Timeout
] # Retry on these HTTP errors
)
root_agent = Agent(
name="assistant",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
description="A simple agent that can answer general questions.",
instruction="""You are a helpful assistant.
Use Google Search for current info or if unsure.""",
tools=[google_search],
)Next, you will define an orchestrator that will run the agent. For experimentation purposes, you can use the InMemoryRunner. For production, you’d use the base Runner class when you need persistent state between runs (see Memory Management).
from google.adk.runners import InMemoryRunner
runner = InMemoryRunner(agent=root_agent)And finally, you can call the run_debug function to prompt the agent with a query.
response = await runner.run_debug(
"When was the Kaggle 5-Day AI Agents Intensive Course happening?",
verbose=True,
)
### Created new session: debug_session_id
User > When was the Kaggle 5-Day AI Agents Intensive Course happening?
assistant > The Kaggle 5-Day AI Agents Intensive Course was happening from November 10th to November 14th, 2025. This course is also available as a self-paced learning guide. It was previously held live from March 31 to April 4, 2025.
And that’s all to get your first single-agent system up and running!
Tools
An AI agent’s capability to call tools to connect them to the outside world is what sets them apart from regular LLM calls and what makes them so powerful. This section discusses four main types of tools an ADK agent can use:
Built-in tools
In the above example, we provided the agent access to a tool called google_search, which is a built-in tool. Some foundation models have built-in tools, where the tool definition is given to the model implicitly. For example, Google’s Gemini API has several built-in tools, such as Google search, code execution, or computer use.
from google.adk.tools import google_search
root_agent = Agent(
name="assistant_with_builtin_tool",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
description="A simple agent that can answer general questions.",
instruction="""You are a helpful assistant.
Use Google Search for current info or if unsure.""",
tools=[google_search],
)runner = InMemoryRunner(agent=root_agent)
response = await runner.run_debug(
"When was the Kaggle 5-Day AI Agents Intensive Course happening?",
verbose=True,
)
### Created new session: debug_session_id
User > When was the Kaggle 5-Day AI Agents Intensive Course happening?
assistant_with_builtin_tool > The Kaggle 5-Day AI Agents Intensive Course was happening from November 10th to November 14th, 2025. This course was designed to teach participants how to build and deploy intelligent AI agents, covering topics such as agent architectures, tools, memory, and evaluation, and moving from prototype to production. Registration for the course has since closed. However, the course content is expected to be available as a self-paced learning guide by the end of November 2025.
Function Tools
The most common type of tool is the function tool. Developers can define custom functions for all foundation models that support “function calling”.
How you define your custom function will impact how well the agent is able to select and use the right tool for a given task. Therefore, it is important that the tool follows a few best practices:
- Docstrings: Enable the agent to understand when and how to use tools
- Type Hints: Enable the agent to generate the correct schema
- Dictionary Returns: Tools return successful tool calls with tool results or error message for failed tool calls
def get_kaggle_progressions(tier: str) -> dict:
"""Looks up the needed medals to progress in the Kaggle competitions tier
based on the tier provided by the user.
Args:
tier: The name of the Kaggle competitions tier. It should be descriptive,
e.g., "expert", "master", or "grandmaster".
Returns:
Dictionary with status and medal information.
Success: {"status": "success", "medals": "2 bronze"}
Error: {"status": "error", "error_message": "Kaggle tier not found"}
"""
# This simulates looking up Kaggle's competition progression
medals_database = {
"expert": "2 bronze",
"gold debit card": "1 gold and 2 silver",
"bank transfer": "5 gold",
}
medals = medals_database.get(tier.lower())
if medals is not None:
return {
"status": "success",
"medals": medals,
}
else:
return {
"status": "error",
"error_message": f"Payment method '{tier}' not found",
}
root_agent = Agent(
name="assistant_with_function_tool",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config,
),
instruction="""You are a Google Developer Expert for Kaggle.
For Kaggle tier progression requests use `get_kaggle_progressions()` to find competition medal requirements for each tier.
If the tool returns status "error", explain the issue to the user clearly.
""",
tools=[get_kaggle_progressions],
)runner = InMemoryRunner(agent=root_agent)
response = await runner.run_debug(
"How many medals do I need to become a Kaggle Competitions Expert?",
verbose=True,
)
### Created new session: debug_session_id
User > How many medals do I need to become a Kaggle Competitions Expert?
assistant_with_function_tool > [Calling tool: get_kaggle_progressions({'tier': 'expert'})]
assistant_with_function_tool > [Tool result: {'status': 'success', 'medals': '2 bronze'}]
Agent Tools
Another type of tool is the AgentTool, when an agent is invoked as a tool. This allows the primary agent to delegate specific tasks (e.g., calculations) to sub-agents while keeping control over the user interaction.
from google.adk.tools import AgentTool
# Define agent tool
tool_agent = Agent(
model=MODEL_NAME,
name="tool_agent",
description="Returns the capital city for any country or state",
instruction="""When the user gives you the name of a country (e.g. Germany),
answer with the name of the capital city of that country.
Otherwise, tell the user you are not able to help them."""
)
# Define primary agent
root_agent = Agent(
name="assistant_with_agent_tool",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
description="Answers user questions and gives advice",
instruction="""Use the tools you have available to answer the user's questions""",
tools=[AgentTool(agent=tool_agent)]
)runner = InMemoryRunner(agent=root_agent)
response = await runner.run_debug(
"I want to visit Germany. Which city do you recommend I visit first?",
verbose=True,
)
### Created new session: debug_session_id
User > I want to visit Germany. Which city do you recommend I visit first?
assistant_with_agent_tool > [Calling tool: tool_agent({'request': 'What is the capital of Germany?'})]
assistant_with_agent_tool > [Tool result: {'result': 'Berlin'}]
assistant_with_agent_tool > I recommend you visit Berlin first. It's the capital city and a great place to start exploring Germany!
Model Context Protocol (MCP) Tools
Writing custom function tools requires writing and maintaining API clients when you want to connect to external systems, like databases, GitHub, or Google services. Instead of writing your own integrations and API clients, you can leverage the Model Context Protocol (MCP), which is an open standard introduced by Anthropic.
The MCP lets you connect your agent (MCP client) to an external MCP server that provides tools, such as image generation or database access.
To use MCP tools with your agent, you first need to choose an MCP server and a tool. You can use the MCP registry to find one. In this tutorial, we will use the Everything MCP Server, which is a demo server providing a tool called getTinyImage to return a test image.
Next, you will need to create an MCPToolset to integrate an ADK agent with an MCP server. This launches the MCP server, establishes a communication channel, and integrates the tool in the agent’s tool list automatically without the need for any additional integration code.
from google.adk.tools.mcp_tool.mcp_toolset import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters
# MCP integration with Everything Server
mcp_server = McpToolset(
connection_params=StdioConnectionParams(
server_params=StdioServerParameters(
command="npx", # Run MCP server via npx
args=[
"-y", # Argument for npx to auto-confirm install
"@modelcontextprotocol/server-everything",
],
tool_filter=["getTinyImage"],
),
timeout=30,
)
)Now, you only have to add the mcp_server to the agent’s tool list and update the agent’s instructions to use it.
# Create image agent with MCP integration
root_agent = Agent(
name="assistant_with_mcp_tool",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
instruction="Use the MCP Tool to generate images for user queries",
tools=[mcp_server],
)This section discussed the main types of tools. In ADK, you also have long-running function tools and OpenAPI tools.
Memory Management
LLMs are stateless. Without access to memory management, every interaction with them is a completely new interaction. In ADK, you use:
Sessionsfor short-term memory managementMemoryfor long-term memory management
Note that since we now want to have conversation history and persistent state between runs, we will no longer use the InMemoryRunner, but instead we will use the base Runner class, which takes session_service for short-term memory and memory_service for long-term memory as input parameters.
Additionally, we cannot use the run_debug() method anymore because it creates a debug session with a debug_session_id. Since we want to distinguish between different sessions, we will need to use the run_async() method, which takes a session ID as input.
Short-Term Memory
Short-term memory most often refers to the conversation history of a session. The conversation history not only records the user queries and the agent’s responses, but also all tool interactions. Therefore, short-term memory can also become a summarized version of the current session for long-running conversations. Short-term memory in ADK is managed by a session_service, which you can pass to the Runner class.
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
# Set up Session Management
session_service = InMemorySessionService()
# Set up the agent
root_agent = Agent(
name="assistant",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
description="A simple agent that can answer general questions.",
instruction="""You are a helpful assistant."""
)
# Create the Runner
runner = Runner(
agent=root_agent,
app_name="default",
session_service=session_service
)To run the session, we create a new session manually and pass it into the run_async method.
session_name = "session1"
app_name = runner.app_name
USER_ID = 'default'
# Create a new session
session = await session_service.create_session(
app_name=app_name,
user_id=USER_ID,
session_id=session_name
)
user_queries = [
"Hi, I am Sam! What is the capital of United States?",
"Hello! What is my name?",
]
for query in user_queries:
print(f"\nUser > {query}")
async for event in runner.run_async(
user_id=USER_ID,
session_id=session.id,
new_message=types.Content(role="user", parts=[types.Part(text=query)])
):
print(f"Assistant > ", event.content.parts[0].text)
User > Hi, I am Sam! What is the capital of United States?
Assistant > Hi Sam! The capital of the United States is Washington, D.C.
User > Hello! What is my name?
Assistant > Your name is Sam.
You can see that the agent was able to remember the user’s name.
Below you can see that we recorded four events in the current session session1.
session = await session_service.get_session(
app_name=app_name,
user_id=USER_ID,
session_id=session_name,
)
for event in session.events:
print(f"{event.content.role}: {event.content.parts[0].text}")user: Hi, I am Sam! What is the capital of United States?
model: Hi Sam! The capital of the United States is Washington, D.C.
user: Hello! What is my name?
model: Your name is Sam.
Note that here we’re using InMemorySessionService, which stores conversations temporarily in RAM for experimentation. In production, you’d use DatabaseSessionService, which stores conversations permanently in a database, as shown below:
from google.adk.sessions import DatabaseSessionService
db_url = "sqlite:///my_agent_data.db" # Local SQLite file
session_service = DatabaseSessionService(db_url=db_url)As you can imagine, recording all events can become a long conversation history, which will lead to higher cost and slower performance, and eventually hit the context window limit. To mitigate this, you can use context compaction, which automatically reduces the context stored in the session. The compaction process summarizes previous events and stores them in a single new event. In ADK, you can use the EventsCompactionConfig class for this.
Another approach to reduce the token number of the static instructions is to cache the request data via context caching. In ADK, you can use the ContextCacheConfig class for this.
Long-term Memory
In contrast to short-term memory, long-term memory persists across multiple conversations in a searchable storage.
In ADK, long-term memory is managed by a memory_service, which has to be first created and then provided to the agent via the Runner class.
from google.adk.memory import InMemoryMemoryService
# Create Session Service
session_service = InMemorySessionService()
# Create Memory Service
memory_service = (InMemoryMemoryService())
# Create runner with BOTH services
runner = Runner(
agent=root_agent,
app_name="default",
session_service=session_service,
memory_service=memory_service, # Memory service is now available!
)Let’s run the session again.
session_name = "session2"
app_name = runner.app_name
USER_ID = 'default'
# Create a new session
session = await session_service.create_session(
app_name=app_name,
user_id=USER_ID,
session_id=session_name,
)
user_queries = [
"Hi, I am Sam! What is the capital of United States?",
"Hello! What is my name?",
]
# Process each query in the list sequentially
for query in user_queries:
print(f"\nUser > {query}")
# Stream the agent's response asynchronously
async for event in runner.run_async(
user_id=USER_ID,
session_id=session.id,
new_message=types.Content(role="user", parts=[types.Part(text=query)])
):
print(f"Assistant > ", event.content.parts[0].text)
User > Hi, I am Sam! What is the capital of United States?
Assistant > Hi Sam! The capital of the United States is Washington, D.C.
User > Hello! What is my name?
Assistant > You told me your name is Sam!
As you can see, the general behavior of the agent looks the same to the user and even records similar events in the new session session2.
session = await session_service.get_session(
app_name=app_name,
user_id=USER_ID,
session_id=session_name
)
for event in session.events:
print(f"{event.content.role}: {event.content.parts[0].text}")user: Hi, I am Sam! What is the capital of United States?
model: Hi Sam! The capital of the United States is Washington, D.C.
user: Hello! What is my name?
model: You told me your name is Sam!
Notice how this agent has both session_service for short-term memory and memory_service for long-term memory? This is because long-term memory is created by transferring session data to memory using the add_session_to_memory() function. While the InMemoryMemoryService stores the entire conversation history, a managed memory service like the Vertex AI Memory Bank extracts key facts from the conversation history and only stores those in the long-term memory.
You can save session data to long-term memory at the end of a session, in periodic intervals, or after every turn, depending on your use case.
await memory_service.add_session_to_memory(session)You can also manually search memory with search_memory. Note that the InMemoryMemoryService search with keyword matching, while the VertexAiMemoryBankService uses semantic search.
APP_NAME = runner.app_name
search_response = await memory_service.search_memory(
app_name=app_name,
user_id=USER_ID,
query="What is the user's name?"
)
print(search_response)memories=[MemoryEntry(content=Content(
parts=[
Part(
text='Hi, I am Sam! What is the capital of United States?'
),
],
role='user'
), custom_metadata={}, id=None, author='user', timestamp='2025-11-25T20:40:25.254618'), MemoryEntry(content=Content(
parts=[
Part(
text='Hi Sam! The capital of the United States is Washington, D.C.'
),
],
role='model'
), custom_metadata={}, id=None, author='assistant', timestamp='2025-11-25T20:40:25.255173'), MemoryEntry(content=Content(
parts=[
Part(
text='Hello! What is my name?'
),
],
role='user'
), custom_metadata={}, id=None, author='user', timestamp='2025-11-25T20:40:25.742681'), MemoryEntry(content=Content(
parts=[
Part(
text='You told me your name is Sam!'
),
],
role='model'
), custom_metadata={}, id=None, author='assistant', timestamp='2025-11-25T20:40:25.743219')]
But what good use is storing and searching in memory when your agent can’t access it? To give your agents access to the memory, you can provide them with the built-in memory tools:
load_memory(Reactive): Agent decides when to search memory. This saves tokens and latency, but you run the risk of the agent forgetting to look up the memory.preload_memory(Proactive): Automatically searches before every turn. This is less efficient, but you’re guaranteed that memory is always available to the agent.
Agent Quality
AI agents are inherently non-deterministic, which makes them unpredictable and difficult to evaluate in traditional ways. Traditional quality assurance practices, such as unit tests, were built for deterministic systems. But an agent can pass all of your unit tests and still fail in production due to wrong decision-making. Therefore, quality assurance in agent systems cannot be treated like a final testing stage but has to be treated as an architectural pillar.
Agent Observability
Without agent observability, you are not able to judge the agent’s decision-making process. Agent observability is reactive. That means, having observability is helpful to have after an error has occurred because it provides you with the required information to debug what went wrong.
The foundational pillars of agent observability are:
- Logs tell us what happened: These are atomic events, such as “I was asked a question”, “I decided to user the vector search tool”, and “Vector search failed”
- Traces tell us why something happened: They reveal a causal relationship between isolated logs, such as “User Query -> Vector search (failed) -> LLM Error (confused by bad tool output) -> Wrong final answer”
- Metrics tell us how well the overall system performed: These can be system metrics, such as performance (latency, error rate), cost (tokens per task, API cost per run), and effectiveness (task completion rate, tool usage frequency), or quality metrics, such as correctness, accuracy, trajectory adherence, safety and responsibility, helpfulness and relevance.
For development debugging, you can use the ADK Web UI. However, for production observability, you can use the built-in LoggingPlugin(), which automatically captures all agent activity:
from google.adk.plugins.logging_plugin import LoggingPlugin
root_agent = Agent(
name="assistant",
model=Gemini(
model=MODEL_NAME,
retry_options=retry_config
),
description="A simple agent that can answer general questions.",
instruction="""You are a helpful assistant.
Use Google Search for current info or if unsure.""",
tools=[google_search],
)
runner = InMemoryRunner(
agent=root_agent,
plugins=[
LoggingPlugin() # Handles standard Observability logging across ALL agents
],
)
response = await runner.run_debug("When was the Kaggle 5-Day AI Agents Intensive Course happening?")### Created new session: debug_session_id User > When was the Kaggle 5-Day AI Agents Intensive Course happening? [logging_plugin] 🚀 USER MESSAGE RECEIVED [logging_plugin] Invocation ID: e-7bf69fd2-6e96-4ada-a35d-3f323a36f606 [logging_plugin] Session ID: debug_session_id [logging_plugin] User ID: debug_user_id [logging_plugin] App Name: InMemoryRunner [logging_plugin] Root Agent: assistant [logging_plugin] User Content: text: 'When was the Kaggle 5-Day AI Agents Intensive Course happening?' [logging_plugin] 🏃 INVOCATION STARTING [logging_plugin] Invocation ID: e-7bf69fd2-6e96-4ada-a35d-3f323a36f606 [logging_plugin] Starting Agent: assistant [logging_plugin] 🤖 AGENT STARTING [logging_plugin] Agent Name: assistant [logging_plugin] Invocation ID: e-7bf69fd2-6e96-4ada-a35d-3f323a36f606 [logging_plugin] 🧠 LLM REQUEST [logging_plugin] Model: gemini-2.5-flash-lite [logging_plugin] Agent: assistant [logging_plugin] System Instruction: 'You are a helpful assistant. Use Google Search for current info or if unsure. You are an agent. Your internal name is "assistant". The description about you is "A simple agent that can answer gen...' [logging_plugin] 🧠 LLM RESPONSE [logging_plugin] Agent: assistant [logging_plugin] Content: text: 'The Kaggle 5-Day AI Agents Intensive Course was happening from November 10th to November 14th, 2025.' [logging_plugin] Token Usage - Input: 63, Output: 55 [logging_plugin] 📢 EVENT YIELDED [logging_plugin] Event ID: d46541c5-8037-442f-b8c4-9132e78ca3e8 [logging_plugin] Author: assistant [logging_plugin] Content: text: 'The Kaggle 5-Day AI Agents Intensive Course was happening from November 10th to November 14th, 2025.' [logging_plugin] Final Response: True assistant > The Kaggle 5-Day AI Agents Intensive Course was happening from November 10th to November 14th, 2025. [logging_plugin] 🤖 AGENT COMPLETED [logging_plugin] Agent Name: assistant [logging_plugin] Invocation ID: e-7bf69fd2-6e96-4ada-a35d-3f323a36f606 [logging_plugin] ✅ INVOCATION COMPLETED [logging_plugin] Invocation ID: e-7bf69fd2-6e96-4ada-a35d-3f323a36f606 [logging_plugin] Final Agent: assistant
Agent Evaluation
Agent evaluation is the process of evaluating how well an AI agent performs on a task, including its decision-making process. That means, you want to evaluate the agent on two aspects:
- Output (end-to-end) evaluation is done via “Outside-in” view: For example, how similar is the agent’s response to the expected response (example metrics are (ask success rate, user satisfaction, overall quality)
- Process evaluation is done via “Inside-out view”: For example, did the agent approach a task correctly (planning, tool usage with correct parameters, tool response interpretation, etc.)
Agent evaluation is proactive because by evaluating your agent’s performance regularly, you are able to detect any performance degradation early on.
This is quite a complex topic, and I recommend you deep dive into the original whitepaper.
Multi-agent systems
So far, we’ve only looked at a single agent. However, instead of a single “monolithic” agent, you can also build a multi-agent system of specialized agents.
Multi-agent patterns
You can combine multiple agents in different patterns depending on your use case.
- LLM-based (Sub-agents): Use when the agent can dynamically orchestrate sub-agents on its own.
- Sequential: Use when deterministic order is important in a linear workflow.
- Parallel: Use when you have independent tasks and speed is important.
- Loop: Use when you need iterative improvement through repeated cycles
Agent2Agent Protocol
When building multi-agent systems, you might want to integrate an agent that’s not part of your project. For example, different agents can be created using different frameworks, such as CrewAI or LangGraph (cross-framework). Or agents can be implemented in different programming languages, such as Python or Java (cross-language). And finally, you might want to integrate an agent from an external vendor (cross-organization). For these purposes, it is helpful to have a standardized communication protocol, such as the Agent2Agent (A2A) protocol.
If you want to expose your agent and make it accessible to other agents, you can use ADK’s to_a2a() function, as follows:
from google.adk.a2a.utils.agent_to_a2a import to_a2a
my_agent = Agent(
...
)
public_agent_app = to_a2a(
my_agent,
port=8001, # Port where the agent will be served
)If you want to consume an agent, you can do so
from google.adk.agents.remote_a2a_agent import (
RemoteA2aAgent,
AGENT_CARD_WELL_KNOWN_PATH,
)
remote_product_catalog_agent = RemoteA2aAgent(
name="product_catalog_agent",
description="Remote product catalog agent from external vendor that provides product information.",
# Point to the agent card URL - this is where the A2A protocol metadata lives
agent_card=f"http://localhost:8001{AGENT_CARD_WELL_KNOWN_PATH}",
)Summary
This article is a summary of my learnings from Kaggle’s 5-Day AI Agents Intensive Course. During the course, I learned the fundamentals of how to build a first simple agent using Google’s ADK Python library, to its core concepts of different types of tools (including MCP tools) and memory management via sessions (for short-term) and memory (for long-term). Then, I learned that agent quality is a core pillar of agent architectures and not just a testing stage like in traditional quality assurance because of the non-deterministic characteristic of AI agents. Finally, the course touched on different patterns for multi-agent systems and how you can use the A2A protocol to allow collaboration between agents across different languages, frameworks, and organizations.
If you are interested in the details of any section of this blog, I recommend having a look at the free course materials.
References
- Introduction to Agents Whitepaper
- Kaggle Notebook: From Prompt to Action
- Kaggle Notebook: Agent Architectures
- Agent Tools & Interoperability with MCP Whitepaper
- Kaggle Notebook: Agent Tools
- Kaggle Notebook: Agent Tools Best Practices
- Context Engineering: Sessions & Memory Whitepaper
- Kaggle Notebook: Agent Session
- Kaggle Notebook: Agent Memory
- Agent Quality Whitepaper
- Kaggle Notebook: Agent Observability
- Kaggle Notebook: Agent Evaluation
- Prototype to Production Whitepaper
- Kaggle Notebook: Agent2Agent Communication