Skip to main content

Command Palette

Search for a command to run...

Basic Concepts when building AI Agents

Updated
โ€ข9 min read

Let's cover some basic concepts that we will utilise in our AI agents. We start by calling through to an LLM (OpenAI) via code.

Subsequently, we will introduce LiteLLM as a wrapper to access LLMs from our code. We will also do some basic evaluation of our results via the GAIA dataset.

Let's get started.


Project Resources

All code examples discussed in this post can be found in the accompanying repo: GiHub: ai-notebook.

If you wish to code along, the details of my setup can be found in my How I set up a Python AI project post.


Concepts

I created basic-concepts.py to experiment with some of the basic concepts we will be using as we develop AI agents.


๐Ÿ—๏ธ The Basic Connection

The foundation of any agent is the ability to communicate with a model. We start by creating a simple interface to send a prompt and receive a string.

To do this I will use the gpt-5-mini model by OpenAI, in order to call it via the OpenAI python sdk you will need to sign-up and create an API Key. Note you will need to put some credit in your account ($5 is the minimum but should be sufficient for this learning).

Ensure you have copied my .env.example file to a new .env file where you can add your API Keys.

cp .env.example .env

NOTE: ensure that your .env file is referenced in your .gitignore file if you are planning to push your code to a git repository, to ensure your keys are not exposed.

A simple call through to OpenAI

The foundation of any agent is the ability to communicate with a model. We start by creating a simple interface to send a prompt and receive a response.

import os 
from openai import OpenAI 
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv()) 
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create( 
    model="gpt-5-mini", 
    messages=[{"role": "user", "content": "My name is Rob"}] ) print(response.choices[0].message.content) 

Note the use of the 'role' to distinguish the actor that was responsible for the content. This becomes relevant was our context evolves and we need to keep track of the originator of each interaction.

The actual response text is found in response.choices[0].message.content. The reason choices is a list is that you can request multiple responses using the n parameter, though typically you only use the first one.

The OpenAI client supports both the chat.completions syntax and the newer client.responses. Which was introduced by OpenAI to better reflect the multimodal nature of it's GPT modals. Anthropic, Gemini and other LLMs will all provide a slightly different syntax, increasing the need for boilerplate code in our agent.

Thankfully to decouple this tight dependency there are numerous tools that abstract the underlying calls to major LLMs via a wrapper. LiteLLM is one we will use.


LiteLLM - Wrapper to our model call

You can add the litellm dependency to your project as follows:

uv add litellm

By importing the completion function from liteLLM our same call can be made like so:

from litellm import completion
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

## Basic call to LLM via LiteLLM wrapper

response1 = completion(
    model="gpt-5-mini",
    messages=[{"role": "user", "content": "My name is Rob"}]
)
print(response1.choices[0].message.content)

Litellm let's you call over 100 LLMs through this unified completion interface, allowing us to easily swap out LLMs easily. Provided you also have an Anthropic API Key configured in your .env file you can test this out by simply updating the model to a Claude model:

response1 = completion(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "My name is Rob"}]
)

LLM APIs are stateless

When you use ChatGPT or Claude via their web interface, they appear to remember previous conversations. However, LLM APIs are stateless, so each API call is indepenent and has no memory of the previous call. Let's test this out for ourselves:

from litellm import completion
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

response1 = completion(
    model="gpt-5-mini", messages=[{"role": "user", "content": "Hi, my name is Rob"}]
)
print(response1.choices[0].message.content)

response2 = completion(
    model="gpt-5-mini", messages=[{"role": "user", "content": "What is my name?"}]
)
print(response2.choices[0].message.content)

Even though we introduced ourselves in the first call, the subsequent second call has no memory of it. Therefore to maintain conversation history we must manage it ourselves.

messages = []

# First Exchange
messages.append({"role": "user", "content": "My name is Rob."})
response = completion(model="gpt-5-mini", messages=messages)
assistant_message1 = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message1})
print(assistant_message1)

# Second Exchange - includes previous conversation history
messages.append({"role": "user", "content": "What is my name?"})
response2 = completion(model="gpt-5-mini", messages=messages)
assistant_message2 = response2.choices[0].message.content
print(assistant_message2)

We accumulate all conversation content in the messages list and pass the entire history with each call. We ensure that the messages have a corresponding role, the user being the human asking the question, and assistant being the LLM.


Structured Output

We are familiar with the natural languages text that LLMs generate and it is great for humans to read, but incovenient for programs to process. Most modern LLM providers (OpenAI, Anthropic, Gemini etc) support 'Structured Output', a feature that allows us to instruct LLMs to generate responses in a defiend format, like JSON.

A common approach to this is by defining your desired output format using Python's Pydantic library. Pydantic is a library for data validation that lets you define data structures as classes. By inheriting from BaseModel you can create a schema that specifies field names and their types.

Let's play with an example ExtractedInfo that defines three fields: name and email are required strings, while phone is an optional string that defaults to None if not provided.

You can add the pydantic dependency to your project as follows:

uv add pydantic
from pydantic import BaseModel


class ExtractedInfo(BaseModel):
    name: str
    email: str
    phone: str | None = None


response4 = completion(
    model="gpt-5-mini",
    messages=[
        {
            "role": "user",
            "content": "My name is John Smith, my email is john@example.com, and my phone number is 07712345678",
        }
    ],
    response_format=ExtractedInfo,
)

result = response4.choices[0].message.content
print(result)

We pass our Pydantic model to response_format, a parameter supported by litellm. If you inspect the result from our gpt-5-mini model, you should observe the LLM adhering to our defined ExtractedInfo schema:

{"name":"John Smith","email":"john@example.com","phone":"07712345678"}

Try experimenting by altering the initial user question, to remove the phone number and running the prompt again. This time the LLM has no possible way of knowing the phone number so excludes it from it's structured response. How does this differ if you remove one of the none optional fields from the content?

Structured output plays a crucial role in agent development. In tool calling, which we'll cover later, the LLM must output which tool to call with which arguments in a structured format. One of the core capabilities of structured output is the ability to convert user intent into appropriate actions.


Asynchronous calls

It is possible that you may need to process multiple LLM requests simultaneously when developing your agents. This could be due to comparing responses from multiple models, running a multi-agent system or evaludating dozens of problems in a benchmark test.

This is handled as it would be in any Python program via async/await and the asyncio library. Litellm supports asynchronous calls through the acompletion function.

import asyncio
from litellm import acompletion

# Limit to 10 concurrent requests
semaphore = asyncio.Semaphore(10)


async def call_llm(prompt: str) -> str:
    """LLM call with rate limiting and automatic retry."""
    async with semaphore:
        response = await acompletion(
            model="gpt-5-mini",
            messages=[{"role": "user", "content": prompt}],
            num_retries=3,  # Automatic rety with exponential backoff
        )
        return response.choices[0].message.content


# Even if we had 100 prompts, only 10 API calls run at a time
prompts = [
    "What is 2 + 2?",
    "What is the capital of Japan?",
    "Who wrote Romeo and Juliet?",
]

# Execute all requests concurrently
## `return_exceptions=True` argument prevents a single failure from cancelling all other tasks,
## Instead, exceptions are returned as values in the results list, allowing us to handle failures gracefully
## while still getting results from successful calls.
tasks = [call_llm(p) for p in prompts]
results = await asyncio.gather(*tasks, return_exceptions=True)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt}")
    print(f"A: {result}\n")

This example includes measures to prevent two common issues when sending many requests simultaneously. Rate limits from the API provider and transient failures from network issues or server overload. LiteLLM's num_retries parameter handles transient failures with automatice exponential backoff.

For rate limiting we can use Python's asyncio.Semaphore to limit how many requests run concurrently. By wrapping the acompletion call we can ensure that no matter how many tasks are running only a limited number of actual API calls happen at once.


System Prompt

System prompts are particularly important for agents. Agents act autonomously across multiple steps. System prompts define the behavioral rules that guide all of these decisions. From a context engineering perspective, the system prompt is information that is always included in the context.

Every time the agent calls a tool, analyzes results, or decides on its next action, the system prompt sits at the front of the context, guiding the agent's judgment. This is why the quality of the system prompt determines the quality of the agent's overall behavior.

When wirting a system prompt, you typically provide this context to the LLM with the role of 'system'. For example using the acompletion call via liteLLM:

response = await acompletion(
            model=model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": question},
            ],

You can see some examples of a production agents system prompt as Anthropic publishes Claude's system promtps. Note this Claude system prompt is applied when interacting through the claude.ai website or mobile chat interface. When calling Claude through the API, developers write their own system prompts. Claude's system prompt serves 4 main roles: defining the products identity, specifying output format and style, setting boundaries on prohibited behaviour and clarifying the limits of it's knowledge.

Building an AI Agent from Scratch

Part 1 of 4

This is a research-driven journey into building an AI agent from first principles. No frameworks. No abstractions hiding the mechanics. No magic. This documentation captures lessons learned and hopefully acts as a useful tutorial for others. Why Build From Scratch? Modern agent frameworks are powerful โ€” but they can abstract away the core elements of all AI Agents have in common. By building and Agent from scratch we will develop an understanding of the fundamentals of AI Agents as a solid foundational learning.

Up next

Evaluation: Measuring our Agents capabilities with Gaia

When working with the non-deterministic nature of LLMs (the same prompt will return different responses each time, due to the probabilistic nature of LLMs) traditional means of software testing are no