Basic Concepts when building AI Agents
Let's cover some basic concepts that we will utilise in our AI agents. We start by calling through to an LLM (OpenAI) via code.
Subsequently, we will introduce LiteLLM as a wrapper to access LLMs from our code. We will also do some basic evaluation of our results via the GAIA dataset.
Let's get started.
Project Resources
All code examples discussed in this post can be found in the accompanying repo: GiHub: ai-notebook.
If you wish to code along, the details of my setup can be found in my How I set up a Python AI project post.
Concepts
I created basic-concepts.py to experiment with some of the basic concepts we will be using as we develop AI agents.
๐๏ธ The Basic Connection
The foundation of any agent is the ability to communicate with a model. We start by creating a simple interface to send a prompt and receive a string.
To do this I will use the gpt-5-mini model by OpenAI, in order to call it via the OpenAI python sdk you will need to sign-up and create an API Key. Note you will need to put some credit in your account ($5 is the minimum but should be sufficient for this learning).
Ensure you have copied my .env.example file to a new .env file where you can add your API Keys.
cp .env.example .env
NOTE: ensure that your
.envfile is referenced in your.gitignorefile if you are planning to push your code to a git repository, to ensure your keys are not exposed.
A simple call through to OpenAI
The foundation of any agent is the ability to communicate with a model. We start by creating a simple interface to send a prompt and receive a response.
import os
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": "My name is Rob"}] ) print(response.choices[0].message.content)
Note the use of the 'role' to distinguish the actor that was responsible for the content. This becomes relevant was our context evolves and we need to keep track of the originator of each interaction.
The actual response text is found in response.choices[0].message.content. The reason choices is a list is that you can request multiple responses using the n parameter, though typically you only use the first one.
The OpenAI client supports both the chat.completions syntax and the newer client.responses. Which was introduced by OpenAI to better reflect the multimodal nature of it's GPT modals. Anthropic, Gemini and other LLMs will all provide a slightly different syntax, increasing the need for boilerplate code in our agent.
Thankfully to decouple this tight dependency there are numerous tools that abstract the underlying calls to major LLMs via a wrapper. LiteLLM is one we will use.
LiteLLM - Wrapper to our model call
You can add the litellm dependency to your project as follows:
uv add litellm
By importing the completion function from liteLLM our same call can be made like so:
from litellm import completion
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
## Basic call to LLM via LiteLLM wrapper
response1 = completion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "My name is Rob"}]
)
print(response1.choices[0].message.content)
Litellm let's you call over 100 LLMs through this unified completion interface, allowing us to easily swap out LLMs easily. Provided you also have an Anthropic API Key configured in your .env file you can test this out by simply updating the model to a Claude model:
response1 = completion(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "My name is Rob"}]
)
LLM APIs are stateless
When you use ChatGPT or Claude via their web interface, they appear to remember previous conversations. However, LLM APIs are stateless, so each API call is indepenent and has no memory of the previous call. Let's test this out for ourselves:
from litellm import completion
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
response1 = completion(
model="gpt-5-mini", messages=[{"role": "user", "content": "Hi, my name is Rob"}]
)
print(response1.choices[0].message.content)
response2 = completion(
model="gpt-5-mini", messages=[{"role": "user", "content": "What is my name?"}]
)
print(response2.choices[0].message.content)
Even though we introduced ourselves in the first call, the subsequent second call has no memory of it. Therefore to maintain conversation history we must manage it ourselves.
messages = []
# First Exchange
messages.append({"role": "user", "content": "My name is Rob."})
response = completion(model="gpt-5-mini", messages=messages)
assistant_message1 = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message1})
print(assistant_message1)
# Second Exchange - includes previous conversation history
messages.append({"role": "user", "content": "What is my name?"})
response2 = completion(model="gpt-5-mini", messages=messages)
assistant_message2 = response2.choices[0].message.content
print(assistant_message2)
We accumulate all conversation content in the messages list and pass the entire history with each call. We ensure that the messages have a corresponding role, the user being the human asking the question, and assistant being the LLM.
Structured Output
We are familiar with the natural languages text that LLMs generate and it is great for humans to read, but incovenient for programs to process. Most modern LLM providers (OpenAI, Anthropic, Gemini etc) support 'Structured Output', a feature that allows us to instruct LLMs to generate responses in a defiend format, like JSON.
A common approach to this is by defining your desired output format using Python's Pydantic library. Pydantic is a library for data validation that lets you define data structures as classes. By inheriting from BaseModel you can create a schema that specifies field names and their types.
Let's play with an example ExtractedInfo that defines three fields: name and email are required strings, while phone is an optional string that defaults to None if not provided.
You can add the pydantic dependency to your project as follows:
uv add pydantic
from pydantic import BaseModel
class ExtractedInfo(BaseModel):
name: str
email: str
phone: str | None = None
response4 = completion(
model="gpt-5-mini",
messages=[
{
"role": "user",
"content": "My name is John Smith, my email is john@example.com, and my phone number is 07712345678",
}
],
response_format=ExtractedInfo,
)
result = response4.choices[0].message.content
print(result)
We pass our Pydantic model to response_format, a parameter supported by litellm. If you inspect the result from our gpt-5-mini model, you should observe the LLM adhering to our defined ExtractedInfo schema:
{"name":"John Smith","email":"john@example.com","phone":"07712345678"}
Try experimenting by altering the initial user question, to remove the phone number and running the prompt again. This time the LLM has no possible way of knowing the phone number so excludes it from it's structured response. How does this differ if you remove one of the none optional fields from the content?
Structured output plays a crucial role in agent development. In tool calling, which we'll cover later, the LLM must output which tool to call with which arguments in a structured format. One of the core capabilities of structured output is the ability to convert user intent into appropriate actions.
Asynchronous calls
It is possible that you may need to process multiple LLM requests simultaneously when developing your agents. This could be due to comparing responses from multiple models, running a multi-agent system or evaludating dozens of problems in a benchmark test.
This is handled as it would be in any Python program via async/await and the asyncio library. Litellm supports asynchronous calls through the acompletion function.
import asyncio
from litellm import acompletion
# Limit to 10 concurrent requests
semaphore = asyncio.Semaphore(10)
async def call_llm(prompt: str) -> str:
"""LLM call with rate limiting and automatic retry."""
async with semaphore:
response = await acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": prompt}],
num_retries=3, # Automatic rety with exponential backoff
)
return response.choices[0].message.content
# Even if we had 100 prompts, only 10 API calls run at a time
prompts = [
"What is 2 + 2?",
"What is the capital of Japan?",
"Who wrote Romeo and Juliet?",
]
# Execute all requests concurrently
## `return_exceptions=True` argument prevents a single failure from cancelling all other tasks,
## Instead, exceptions are returned as values in the results list, allowing us to handle failures gracefully
## while still getting results from successful calls.
tasks = [call_llm(p) for p in prompts]
results = await asyncio.gather(*tasks, return_exceptions=True)
for prompt, result in zip(prompts, results):
print(f"Q: {prompt}")
print(f"A: {result}\n")
This example includes measures to prevent two common issues when sending many requests simultaneously. Rate limits from the API provider and transient failures from network issues or server overload. LiteLLM's num_retries parameter handles transient failures with automatice exponential backoff.
For rate limiting we can use Python's asyncio.Semaphore to limit how many requests run concurrently. By wrapping the acompletion call we can ensure that no matter how many tasks are running only a limited number of actual API calls happen at once.
System Prompt
System prompts are particularly important for agents. Agents act autonomously across multiple steps. System prompts define the behavioral rules that guide all of these decisions. From a context engineering perspective, the system prompt is information that is always included in the context.
Every time the agent calls a tool, analyzes results, or decides on its next action, the system prompt sits at the front of the context, guiding the agent's judgment. This is why the quality of the system prompt determines the quality of the agent's overall behavior.
When wirting a system prompt, you typically provide this context to the LLM with the role of 'system'. For example using the acompletion call via liteLLM:
response = await acompletion(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question},
],
You can see some examples of a production agents system prompt as Anthropic publishes Claude's system promtps. Note this Claude system prompt is applied when interacting through the claude.ai website or mobile chat interface. When calling Claude through the API, developers write their own system prompts. Claude's system prompt serves 4 main roles: defining the products identity, specifying output format and style, setting boundaries on prohibited behaviour and clarifying the limits of it's knowledge.




