Agents
This notebook introduces LLM-based agents, tool use, function calling, and the Model Context Protocol (MCP).
Suggested duration: 1 hour
Learning goals
By the end of this notebook, you should be able to:
explain what an LLM-based agent is and how it differs from a plain chat call
describe the basic agent loop of goal, action, observation, and repetition
explain what function calling is and why tools matter
recognize the role of LangChain in tool orchestration
identify practical risks and good practices when building simple agents
Table of Contents
What is an LLM-based agent?
Function-calling demo
What is an LLM-based agent?
A plain LLM call takes an input and returns an output.
An agent adds a control loop around the model so it can:
inspect context
choose an action
call tools
observe the result
continue until the task is complete
This is useful when solving a task requires interaction with external systems such as files, APIs, databases, or search tools.
Agent loop
A simple agent loop looks like this:
Receive a user goal.
Decide the next best action.
Call a tool or ask for missing information.
Observe the result.
Repeat until a stopping condition is met.
This makes agents more powerful than plain chat completion, but also introduces new failure modes such as loops, wrong tool use, and unsafe actions.
Agent Loop: Prototype
Iteration |
Step |
Description |
|---|---|---|
1 |
1 |
User Goal: «Quiero saber si hubo un evento meteorológico relevante en el Estrecho de Gibraltar en enero de 2011.» |
1 |
2 |
Agent: «No tengo información» → Necesito ver qué hay en el dataset |
1 |
3 |
Action: Calls a tool → inspect NetCDF |
1 |
4 |
Observation: Finds pressure variable, lat, lon, and time |
1 |
5 |
Condition not met → Repeat |
2 |
2 |
Agent: «Los eventos requieren extraer información en una región y período» |
2 |
3 |
Action: Extract pressure over Gibraltar across time windows |
2 |
4 |
Observation: Obtains data [time periods + mean pressure] |
2 |
5 |
Repeat |
3 |
2 |
Agent: Detects a significant pressure difference |
3 |
3 |
Action: Generates response → «A significant drop in sea-level pressure was observed in the Strait of Gibraltar during…» |
We need two behaviours:
Model reasoning
Function calling
Model reasoning
What reasoning in LLMs is (and is not)
It is NOT:
having deep understanding
having intention or consciousness
It IS:
the ability to break down a problem into coherent steps and arrive at a useful answer
Why some models “reason better”
Some models have been trained to:
Generate intermediate steps
Partially verify what they produce
Explore multiple solution paths
Limitations
Even when a model appears to “reason”, it can:
make mistakes in intermediate steps
invent relationships
sound convincing while being wrong
[7]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Reasoning-focused vs general instruct model
# Note: pick smaller variants if your machine has limited RAM/VRAM or vice versa
# NOTE ! This cells execution takes a while!!!!!
reasoning_model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
general_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
prompt = (
"You are helping a climate analyst. "
"Solve this task step by step and clearly separate reasoning steps from the final answer:\n"
"A buoy reports sea-surface temperatures: [18.4, 18.9, 19.7, 20.1, 20.6].\n"
"1) Compute the mean temperature.\n"
"2) Compute the warming from first to last value.\n"
"3) Explain in 2 lines what this suggests about local marine conditions."
)
def load_model_and_tokenizer(model_id: str):
tokenizer = AutoTokenizer.from_pretrained(model_id)
dtype = torch.float16 if torch.cuda.is_available() else torch.float32
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Avoid device_map="auto" so this runs without requiring accelerate.
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=dtype,
)
model = model.to(device)
return model, tokenizer
def generate_step_by_step_reply(model_id: str, user_prompt: str, max_new_tokens: int = 280):
model, tokenizer = load_model_and_tokenizer(model_id)
messages = [
{"role": "system", "content": "Be precise, concise, and structured."},
{"role": "user", "content": user_prompt},
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
new_tokens = output_ids[0][inputs["input_ids"].shape[-1]:]
reply = tokenizer.decode(
new_tokens,
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
)
reply = reply.replace("Ġ", " ").replace("Ċ", "\n").replace("°C", "°C")
return reply.strip()
print("Generating reasoning-model response...\n")
reasoning_reply = generate_step_by_step_reply(reasoning_model_id, prompt)
print("Generating general-model response...\n")
general_reply = generate_step_by_step_reply(general_model_id, prompt)
Generating reasoning-model response...
Loading weights: 100%|██████████| 339/339 [00:02<00:00, 168.54it/s]
Generating general-model response...
Loading weights: 100%|██████████| 338/338 [00:00<00:00, 402.40it/s]
[ ]:
print("=" * 90)
print(f"Reasoning model: {reasoning_model_id}")
print("-" * 90)
print(reasoning_reply)
print("\n" + "=" * 90)
print(f"General model: {general_model_id}")
print("-" * 90)
print(general_reply)
==========================================================================================
Reasoning model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
------------------------------------------------------------------------------------------
Okay, so I need to figure out how to approach this problem. Let me read it again carefully.
The user provided a query where an analyst is trying to understand the sea surface temperatures reported by Abuoy. The temperatures are [18.4, 18.9, 19.7, 20.1, 20.6]. The tasks are:
1. Compute the mean temperature.
2. Compute the warming from the first to the last value.
3. Explain in two lines what this suggests about local marine conditions.
Alright, let's break this down step by step.
First, computing the mean temperature. That's straightforward. I need to add up all the temperatures and then divide by the number of data points. Let me add them up: 18.4 + 18.9 is 37.3, plus 19.7 makes 57, plus 20.1 is 77.1, and finally adding 20.6 gives 97.7. There are five data points, so dividing by 5 gives 97.7 / 5 = 19.54°C. So the mean temperature is 19.54°C.
Next, computing the warming from the first to the last value. That means subtracting the first temperature from
==========================================================================================
General model: Qwen/Qwen2.5-1.5B-Instruct
------------------------------------------------------------------------------------------
Sure! Let's break down the task into clear steps:
### Step 1: Compute the Mean Temperature
To compute the mean temperature of the given data set, we need to sum all the values and then divide by the number of observations.
Given data: [18.4, 18.9, 19.7, 20.1, 20.6]
First, calculate the sum:
\[ \text{Sum} = 18.4 + 18.9 + 19.7 + 20.1 + 20.6 \]
\[ \text{Sum} = 107.5 \]
Next, count the number of observations (there are 5 values):
\[ n = 5 \]
Now, compute the mean:
\[ \text{Mean} = \frac{\text{Sum}}{n} = \frac{107.5}{5} = 21.5 \]
So, the mean temperature is **21.5°C**.
### Step 2: Compute the Warming from First to Last Value
The warming from the first to the last value can be calculated by subtracting the first value from the last value.
First value: 18.4°C
Last value: 20.6°C
Warming =
Function calling
Function calling lets the model choose from a set of available tools with defined names, descriptions, and input arguments.
It is useful when the model needs fresh information or precise computation that should not be left to free-form text generation. Typical examples include:
search a document collection
query weather or ocean observations
run a calculator or statistics helper
read a local file or metadata record
send a structured API request
A function-calling workflow usually has four parts:
The application exposes a list of tools, including what each tool does and which arguments it expects.
The model reads the user request and decides whether one of those tools is needed.
The application executes the selected tool outside the model and captures the result.
The result is sent back to the model so it can continue reasoning and produce a final answer.
This means the model does not execute Python functions by itself. It only proposes a structured tool call, and the surrounding program decides whether to run it.
In practice, tool descriptions matter a lot. If two tools overlap or have vague descriptions, the model may choose the wrong one or pass arguments in the wrong format. That is why good tool names, explicit docstrings, and simple argument schemas are part of prompt design.
Why use LangChain here?
LangChain is a library for building LLM applications that need more than a single prompt-response step. Its goal is to make it easier to connect models with tools, prompts, memory, retrieval components, and execution flows.
For this notebook, the most useful part is that LangChain provides a clean wrapper for tools:
@toolturns a Python function into a structured tool with name, description, and schemabind_tools(...)attaches those tools to the model callresponse.tool_callsexposes the model decision as structured data instead of raw text that we would need to parse manually
So LangChain is not the model itself. It is the orchestration layer around the model. In other words, it helps us build the application logic that sits between the user request, the model, and the external tools.
References:
Function-calling demo
The next cell defines two tools with LangChain and binds them to a chat model.
The example shows the basic idea of the workflow:
the user sends a request,
the model decides whether a tool is needed,
LangChainexposes that decision throughresponse.tool_calls,Python executes the selected tool,
the application can continue with the result.
This version is intentionally simple: the goal is to show how the wrapper works, not to build a full agent loop yet.
[17]:
from getpass import getpass
import os
api_key_input = getpass("Enter your API key: ").strip()
if not api_key_input:
raise ValueError("No API key provided.")
# Store it in the current notebook session so other cells can read it.
os.environ["API_KEY_COURSE"] = api_key_input
print("API key loaded for this session.")
API key loaded for this session.
[19]:
import os
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
# Example: using LangChain tools with an OpenAI-compatible chat model
api_key = os.getenv("API_KEY_COURSE")
if not api_key:
raise ValueError("Set API_KEY_COURSE with previous cell")
base_url = "https://openrouter.ai/api/v1"
model = "nvidia/nemotron-3-super-120b-a12b:free"
@tool
def get_station_info(code: str) -> dict:
"""Return a short description for a known environmental service.
Args:
code: Short service code such as SOCIB or CMEMS.
"""
stations = {
"SOCIB": "Balearic Islands observing system in the western Mediterranean.",
"CMEMS": "Copernicus Marine service for ocean products and monitoring.",
}
key = code.upper().strip()
return {"code": key, "description": stations.get(key, "unknown service")}
@tool
def sum_numbers(a: float, b: float) -> float:
"""Add two numbers.
Args:
a: First number.
b: Second number.
"""
return a + b
llm = ChatOpenAI(
model=model,
api_key=api_key,
base_url=base_url,
temperature=0,
)
tools = [get_station_info, sum_numbers]
llm_with_tools = llm.bind_tools(tools)
response = llm_with_tools.invoke(
"What is SOCIB and what is 3 + 5? Use tools when helpful."
)
print("Assistant text:\n")
print(response.content)
print("\nTool calls detected by LangChain:\n")
if not response.tool_calls:
print("No tool calls returned by the model.")
else:
for i, tool_call in enumerate(response.tool_calls, start=1):
print(f"{i}. Tool: {tool_call['name']}")
print(f" Arguments: {tool_call['args']}")
tool_registry = {tool_obj.name: tool_obj for tool_obj in tools}
print("\nTool execution results:\n")
for i, tool_call in enumerate(response.tool_calls, start=1):
result = tool_registry[tool_call["name"]].invoke(tool_call["args"])
print(f"{i}. Result: {result}")
Assistant text:
Tool calls detected by LangChain:
1. Tool: get_station_info
Arguments: {'code': 'SOCIB'}
Tool execution results:
1. Result: {'code': 'SOCIB', 'description': 'Balearic Islands observing system in the western Mediterranean.'}
Activity
Try to implement the following tools with an agent OR DESIGN YOUR OWN SCENARIO
[ ]:
from langchain.tools import tool
@tool
def get_temperature(location: str) -> str:
"""Returns current temperature for a location."""
data = {
"Madrid": 32,
"Barcelona": 28,
"Gibraltar": 26
}
return f"{data.get(location, 'Unknown')} °C"
@tool
def get_pressure_trend(location: str) -> str:
"""Returns pressure trend for a location."""
return "Pressure is decreasing rapidly"
@tool
def generate_alert(pressure_trend: str) -> str:
"""Generates a weather alert based on pressure trend."""
if "decreasing" in pressure_trend:
return "Warning: possible unstable weather conditions."
return "No significant weather risks."
user_request = "Is there any risk of bad weather in Gibraltar?"
[ ]:
# Your code here
Model Context Protocol
The Model Context Protocol (MCP) is a standard way to connect models and applications to external tools and context sources.
At a high level, MCP helps by standardizing how a client can discover and use:
tools: callable actions
resources: data sources or documents
prompts: reusable prompt templates
The main value is interoperability. Instead of writing a custom integration for every model and every tool, MCP provides a shared interface.
Reference:
Risks and best practices
Common risks:
calling the wrong tool
repeating the same action in a loop
acting on stale context
using tools with insufficient permissions or safeguards
over-trusting model-generated plans
assuming every provider or wrapper exposes tool calls in exactly the same way
Best practices:
keep tool descriptions explicit and non-overlapping
log actions and intermediate results
validate tool inputs before execution
add stopping conditions to avoid loops
inspect
response.tool_callsor the equivalent structured output during developmentkeep a human in the loop for expensive or sensitive actions
A good extension for class is to add a document-search tool or a small dataset query, then compare a plain chat answer with a tool-augmented answer.